Open probonopd opened 4 years ago
How can we get it to create (and use) an index on each writeable removable disk like Macs do with /.Spotlight-V100
?
I recently came across this project: https://github.com/typesense/typesense
That might be what you are looking for to use in the global menu search?
Is it suitable to search the contents of random txt, docx, PDF, c++,... files?
Or can it deal only with structured data as https://typesense.org/docs/0.16.1/guide/#create-collection seems to suggest at a quick glance?
I didn't use it yet. As far as I'm aware, it's for structured data only (e.g. entries in the global menu).
entries in the global menu
We can already search those :-)
Investigate KDE baloo for file indexing and possibly metadata retrieval
https://community.kde.org/Baloo
Baloo is not an application, but a daemon to index files. Applications can use the Baloo framework to provide file search results.
Baloo focuses on providing a very small memory footprint along with with extremely fast searching. It also supports storing additional file based metadata via extended attributes.
FreeBSD:/home/user% balooctl status
Baloo File Indexer is not running
Total files indexed: 0
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 12.00 KiB
Trying to enable it prints errors:
FreeBSD:/home/user% balooctl enable
Enabling and starting the File Indexer
FreeBSD:/home/user% QKqueueFileSystemWatcherEngine::addPaths: open: No such file or directory
virtual QStringList Solid::Backends::Hal::HalManager::allDevices() error: "org.freedesktop.DBus.Error.ServiceUnknown"
org.kde.solid.udisks2: Failed enumerating UDisks2 objects: "org.freedesktop.DBus.Error.ServiceUnknown"
"The name org.freedesktop.UDisks2 was not provided by any .service files"
org.kde.solid.udisks2: Failed enumerating UDisks2 objects: "org.freedesktop.DBus.Error.ServiceUnknown"
"The name org.freedesktop.UDisks2 was not provided by any .service files"
There seems to be a dependency on UDisks2
, something we'd like to avoid?
Can this be ignored/avoided?
After that, it says Baloo File Indexer is running
and a process /usr/local/bin/baloo_file
is running:
FreeBSD:/home/user% balooctl status
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 0
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 12.00 KiB
Is it doing something? System seems to become less responsive even though CPU usage is not high.
For what it's worth, I never had much joy/luck with Baloo on FreeBSD-CURRENT.
Long ago I disabled file search entirely:
– this morning I re-enabled both search, and indexing of content.
Right now, baloo would be my best bet. Imagine it nicely integrated into the helloSystem global menu search box.
What kind of issues did you experience?
This morning for example, within minutes or moments of me enabling the feature:
Dec 26 09:27:57 mowa219-gjp4-8570p kernel: pid 8762 (baloo_file), jid 0, uid 1002: exited on signal 6 (core dumped)
Confession: I observed the crashing for years, but never bothered to properly investigate or report it. I might begin to do so (in the FreeBSD area) over the Christmas break.
Postscript(s):
balooctl
but https://www.freebsd.org/cgi/man.cgi?query=balooctl finds nothing so I assume that things are significantly different on FreeBSDConsider using Drill
Blocked by: https://github.com/yatima1460/Drill/issues/71
It is not full-text search and does not use an index though, and it is currently written in D although a version in C++ may be on the roadmap.
A PyQt GUI around its CLI gives:
Consider using albert
https://albertlauncher.github.io/
It is written in Qt, has a plugin architecture, and does support indexing.
For me it is crashing when I want to invoke it, possibly related to:
11:40:41 [WARN:default] DBus: Name is either invalid, null or not instanceof string
11:40:41 [WARN:default] DBus: CanRaise is either invalid, null or not instanceof bool
One would have to teach it .app bundles and .AppDir directories, and one would have to integrate it with the global menu bar.
Looks like baloo is working better nowadays, perhaps due to 12.2 rather than 12.1 and newer packages.
In any case, it looks promising!
sudo pkg install kf5-baloo
balooctl enable
balooctl status
# Wait until everything is indexed; does it index only $HOME by default?
baloosearch "FreeBSD Foundation"
Albert seems to have its own indexing (but it seems to be "only" file name indexing, not full text indexing - a sensible performance tradeoff?), and it's Qt based:
22:02:55 [DEBG:default] Serializing files…
22:02:58 [DEBG:default] Building inverted file index…
22:03:03 [INFO:default] Indexed 171954 files in 67196 directories.
22:11:02 [INFO:default] Start indexing files.
22:11:17 [DEBG:default] Serializing files…
22:11:20 [DEBG:default] Building inverted file index…
22:11:26 [INFO:default] Indexed 171954 files in 67196 directories.
Maybe we can use the code rsponsible for the file indexing and searching and put it into the existing search box in the Menu.
If we wanted to use Albert (rather than porting its Files plugin into our already-existing search in Menu) we would have to write a plugin for Application Bundles, taking code from
and putting it into
A neat idea is that Albert finds ssh connections and has text snippets.
Deepin Linux also comes with Global Search. Need to check it out.
Recent https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=259679#c3 verified real-time indexing.
Hi @grahamperrin thanks for the hint.
Additional dependencies look reasonable:
New packages to be INSTALLED:
antiword: 0.37_4 [FreeBSD]
aspell: 0.60.8_1,1 [FreeBSD]
catdoc: 0.95 [FreeBSD]
chmlib: 0.40_1 [FreeBSD]
gsfonts: 8.11_8 [FreeBSD]
librevenge: 0.0.4_13 [FreeBSD]
libwpd010: 0.10.3_4 [FreeBSD]
p5-Image-ExifTool: 12.00 [FreeBSD]
pstotext: 1.9_6 [FreeBSD]
py38-mutagen: 1.42.0_2 [FreeBSD]
recoll: 1.27.3_15 [FreeBSD]
unrar: 6.02,6 [FreeBSD]
unrtf: 0.21.10 [FreeBSD]
xapian-core: 1.4.18,1 [FreeBSD]
Number of packages to be installed: 14
The process will require 52 MiB more space.
13 MiB to be downloaded.
I will try it out.
Indexing is running:
Pros
Cons?
/usr/local/bin/perl /usr/local/share/recoll/filters/rclimg
seem to depend on Perl, something we wanted to get rid of in helloSystem.For real-time indexing: FreeBSD bug 260093 – deskutils/recoll remake X11MON an OPTIONS_DEFAULT
What does the the X11MON option
do, and do you think we'd need it so that Recoll would be suitable for helloSystem?
Recent bugs.freebsd.org/bugzilla/show_bug.cgi?id=259679#c3 verified real-time indexing.
If not built with X11MON
, then you'll not get real-time indexing.
Compare what you have, with https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=259679#c2.
If we could get it working properly, it looks like baloo would be ideal.
System performance comes to a crawl after
balooctl enable
but CPU usage is minimal. Is this I/O bound? Can we throttle its I/O usage?
System performance becomes normal again only after
balooctl disable
balooctl suspend
killall baloo_file # Why is this needed?
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230726#c14 has the answer:
The system is not freezing, it runs into the vnode limit and there obtaining new vnodes is rate limited to 1 per second, which is arguably rather buggy and should be fixed.
In the meantime you can bump sysctl kern.maxvnodes
https://people.freebsd.org/~amdmi3/handbook/configtuning-kernel-limits.html
To see the current number of vnodes in use:
# sysctl vfs.numvnodes vfs.numvnodes: 91349
To see the maximum vnodes:
# sysctl kern.maxvnodes kern.maxvnodes: ...
In my tests,
sudo sysctl kern.maxvnodes=1000000
removed the baloo performance issue. Does this have negative side effects?
Also had to increase the number of allowable open files, I did so by a factor of 10:
sudo sysctl kern.maxfiles=3221930
Without this, I ran into
FreeBSD% balooctl resume
File Indexer resumed
(process:3305): GLib-ERROR **: 09:20:45.190: Creating pipes for GWakeup: Too many open files
Does this have negative side effects?
Now baloo_file
is taking up one CPU core while indexing but the system stays operational.
I wonder if we should throttle baloo_file
to take at most 50% CPU...
Runs smoothly for a while, but then I get
FreeBSD% QProcessPrivate::createPipe: Cannot create pipe 0x480a2ca100 (Too many open files)
and it does not index anything anymore.
Is there a bug which causes baloo to open but never close files?
Am I hitting https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256269?
The answer is hopefully not "disable baloo" or "index fewer files", but to get it fixed? It should be fixed in a way that it can index any arbitrary number of files.
Stopped indexing, then removed old database with rm -rf ~/.local/share/baloo
.
Trying with
only basic indexing=true
in ~/.config/baloofilerc
to only search file names by now; will this improve it?
Yes, this succeeds:
FreeBSD% env LANG=C balooctl status
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 555,743
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 214.31 MiB
So, why does indexing the contents of files lead to too many open files, which in turn leads to the indexing to fail?
Seems like baloo doesn't have the best reputation among FreeBSD users... perhaps because FreeBSD and baloo are not yet properly "tuned for each other" yet?
https://forums.freebsd.org/threads/problems-with-baloo.80107/
Integrating the results of
baloosearch -l 100 helloSystem
into Menu:
Not too shabby...
To start the indexing, it must be enabled with balooctl enable
. Possibly we will do this at ISO installation time in the future.
Seems like baloo doesn't have the best reputation among FreeBSD users
Relatively few issues are specific to FreeBSD. Via sysutils/kf5-baloo, in Bugzilla for FreeBSD:
In Bugzilla for KDE, for Baloo, Baloo file daemon, and balooctl:
Still seeing
FreeBSD% baloo_file
QKqueueFileSystemWatcherEngine::addPaths: open: No such file or directory
which might mean that we don't get new/changed files indexed immediately. Why?
https://www.freebsd.org/cgi/man.cgi?rtprio
To make depend while not disturbing other machine usage: idprio 31 make depend
So we might use idprio 31 baloo_file
to make it run while not disturbing other machine usage?
Using the following components:
See https://wiki.samba.org/index.php/Spotlight_with_Elasticsearch_Backend for instructions. Also see https://wiki.freebsd.org/Elastic.
Then integrate it into the https://github.com/helloSystem/Menu.
tracker
is not an option since it is Gnome, Xdg, D-Bus infested. Too many dependencies on unwelcome technologies.