Open dobrou opened 5 years ago
cc @christinaforney since I think this is more about product decisions.
Hi @dobrou! This will actually soon be possible with Sourcegraph with a new tool that's coming soon called src-expose
! See the pull request at https://github.com/sourcegraph/sourcegraph/pull/5835/ for context.
src-expose
is a tool to periodically snapshot local directories and serve them as Git repositories over HTTP. This is a useful way to get code from other version control systems into Sourcegraph, or textual artifacts from non version controlled systems (eg configuration) into Sourcegraph.
Does this sound like it would help?
@dadlerj I totally missed that we are going to publish src-expose soon!
But I think @dobrou has addressed his concern about using Git:
However data are too big (50GB+) and updated every day, so git may not handle this well.
Hi, thank you for quick and useful response.
src-expore documentaion sounds like it should work. My only concern is performance.
I will try with insiders build and check how it behaves.
Thanks again.
Are you indexing sourcecode? 50GB is quite large for 1 million files, I assume the distribution of files fits some sort of powerlaw and that 50GB is dominated by a few very large files? Note: git will do fine with 1 million text files, especially if it doesn't update often. Additionally src-expose allows you to shard it across a few git repos (by subdir).
Sourcegraph will always create quite a few copies of your data anyways since, so if hard drive space is a concern we will be an issue. EG: clones will be kept by both src-expose and gitserver. Then our indexing system will create indexes which are bigger than the working copies, and some other systems will cache working copies.
Just for context source code should be quite a bit smaller in general. For example here are some stats for the go code in our main repo (not as many files):
Hi @keegancsmith , files are mostly log files from various sources.
I know there are better specialized solutions for logs handling. And I understand this is not your primary usecase.
Idea is just that Sourcegraph is great in fulltext search (one of many other things), so it looked like solution that could solve my problem and is easy to setup and maintain.
Feature request description
Add ability to search folder and files without repository. Mount any local folder into Sourcegraph docker container and configure Sourcegraph to take this folder as repository and index it. Or configure as repository folder on remote windows shared drive.
It will miss all the features like history and code intelligence. However even simple fulltext search in Sourcegraph would provide great value.
Is your feature request related to a problem? If so, please describe.
I have folder full of text data like logs. I would like to leverage quick and efficient search in Sourcegraph to be able to search files in this folder.
Describe alternatives you've considered.
Submit data into git repository and configure Sourcegraph to scan the repository. However data are too big (50GB+) and updated every day, so git may not handle this well.