Open YourFin opened 3 years ago
Thoughts in no particular order:
I 100% think search, streaming, auth, etc will eventually be broken out into their own projects/packages. Part of the reason I like this project.
I've heard really good things about mlocate, maybe that's a place to start, would be way better than find at least. Does some form of indexing.
Have been thinking about datavis, but it ain't gonna look like windirstat. Probably will start with piecharts and gauges first and will build from there.
Tokenization is an interesting issue. As a first go at it, I'd find some package that does close to what we want already and use what they've done. Since I assume none of us are search experts let's use other people's a/b (or hopefully more insightful) testing to our advantage and twerk if we don't like how it works.
I really like the fun ideas, think many of them should try to be in the MVP.
Agree with all your pet peeves. I think the search function should take the search directory as a parameter. As far as good default search goes: assuming current directory, weigh more recently modified files first, weigh files types differently (.mp4 and .docx ranked higher than .dll or .lock) with the ability to change the default in library. Files inside folders shouldn't be served anyway (unless relevant) because that can become very bandwidth expensive.
As far as using the search indexing in the directory browser, I'm not sure what you mean? It would seem providing a path and retrieving all the files at that path wouldn't benefit from indexing.
Agreed on cross-server search. As we get there, the number of results should also be a parameter. Probably even before that, we should tell the search library/function to stop searching after 50 or so results. Do you think it's reasonable to serve 50 results, wait for the client to ask for more and return the next 50? I can imagine a scenario where a client asks for all of the .txt files and gets returned thousands of results.
Opening this issue as a discussion point for search and brain dumping ground.
Doing search at a half-decent clip probably means keeping an index of all the files capable of being served, and we should re-use that functionality for the directory browser. Keeping it up to date, however, will be annoying. We'll probably need to hook into the filesystem events during runtime, and re-validate the old cache on each startup. If we want to provide a truly excellent user experience we may also want to be able to gracefully degrade and verify that folder listings are correct manually per-request on startup until indexing is finished. But that's a whole other problem.
There are roughly two camps that I see wrt how to improve search beyond the braindead "does it match
/.*${search_query}.*/i
" approach: the fd/ivy/helm/fzf/ide "let's get clever about how we do partial character matches across the query string" approach that, for example, strongly matches "FeedMeSeymore" when the query is "memor feed", and the more google/elasticsearchy approach that tries to get more intelligent about acting on a higher level than individual characters.Personal pet peeves to avoid:
Fun ideas:
Things to ponder: