helloSystem / hello

Desktop system for creators with a focus on simplicity, elegance, and usability. Based on FreeBSD. Less, but better!
2.3k stars 57 forks source link

Local full-text search #33

Open probonopd opened 3 years ago

probonopd commented 3 years ago

Using the following components:

See https://wiki.samba.org/index.php/Spotlight_with_Elasticsearch_Backend for instructions. Also see https://wiki.freebsd.org/Elastic.

Then integrate it into the https://github.com/helloSystem/Menu.

tracker is not an option since it is Gnome, Xdg, D-Bus infested. Too many dependencies on unwelcome technologies.

probonopd commented 3 years ago

How can we get it to create (and use) an index on each writeable removable disk like Macs do with /.Spotlight-V100?

shilch commented 3 years ago

I recently came across this project: https://github.com/typesense/typesense
That might be what you are looking for to use in the global menu search?

probonopd commented 3 years ago

Is it suitable to search the contents of random txt, docx, PDF, c++,... files?

Or can it deal only with structured data as https://typesense.org/docs/0.16.1/guide/#create-collection seems to suggest at a quick glance?

shilch commented 3 years ago

I didn't use it yet. As far as I'm aware, it's for structured data only (e.g. entries in the global menu).

probonopd commented 3 years ago

entries in the global menu

We can already search those :-)

probonopd commented 3 years ago

Investigate KDE baloo for file indexing and possibly metadata retrieval

https://community.kde.org/Baloo

Baloo is not an application, but a daemon to index files. Applications can use the Baloo framework to provide file search results.

Baloo focuses on providing a very small memory footprint along with with extremely fast searching. It also supports storing additional file based metadata via extended attributes.

FreeBSD:/home/user% balooctl status
Baloo File Indexer is not running
Total files indexed: 0
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 12.00 KiB

Trying to enable it prints errors:

FreeBSD:/home/user% balooctl enable
Enabling and starting the File Indexer
FreeBSD:/home/user% QKqueueFileSystemWatcherEngine::addPaths: open: No such file or directory
virtual QStringList Solid::Backends::Hal::HalManager::allDevices()  error:  "org.freedesktop.DBus.Error.ServiceUnknown" 

org.kde.solid.udisks2: Failed enumerating UDisks2 objects: "org.freedesktop.DBus.Error.ServiceUnknown" 
 "The name org.freedesktop.UDisks2 was not provided by any .service files"
org.kde.solid.udisks2: Failed enumerating UDisks2 objects: "org.freedesktop.DBus.Error.ServiceUnknown" 
 "The name org.freedesktop.UDisks2 was not provided by any .service files"

There seems to be a dependency on UDisks2, something we'd like to avoid?

Can this be ignored/avoided?

After that, it says Baloo File Indexer is running and a process /usr/local/bin/baloo_file is running:

FreeBSD:/home/user% balooctl status 
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 0
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 12.00 KiB

Is it doing something? System seems to become less responsive even though CPU usage is not high.

https://community.kde.org/Baloo/Configuration

grahamperrin commented 3 years ago

For what it's worth, I never had much joy/luck with Baloo on FreeBSD-CURRENT.

Long ago I disabled file search entirely:

image

– this morning I re-enabled both search, and indexing of content.

probonopd commented 3 years ago

Right now, baloo would be my best bet. Imagine it nicely integrated into the helloSystem global menu search box.

What kind of issues did you experience?

grahamperrin commented 3 years ago

This morning for example, within minutes or moments of me enabling the feature:

Dec 26 09:27:57 mowa219-gjp4-8570p kernel: pid 8762 (baloo_file), jid 0, uid 1002: exited on signal 6 (core dumped)

Confession: I observed the crashing for years, but never bothered to properly investigate or report it. I might begin to do so (in the FreeBSD area) over the Christmas break.

Postscript(s):

probonopd commented 3 years ago

Consider using Drill

Blocked by: https://github.com/yatima1460/Drill/issues/71

It is not full-text search and does not use an index though, and it is currently written in D although a version in C++ may be on the roadmap.

A PyQt GUI around its CLI gives:

image

probonopd commented 3 years ago

Consider using albert

https://albertlauncher.github.io/

It is written in Qt, has a plugin architecture, and does support indexing.

For me it is crashing when I want to invoke it, possibly related to:

11:40:41 [WARN:default] DBus: Name is either invalid, null or not instanceof string
11:40:41 [WARN:default] DBus: CanRaise is either invalid, null or not instanceof bool

One would have to teach it .app bundles and .AppDir directories, and one would have to integrate it with the global menu bar.

probonopd commented 3 years ago

Looks like baloo is working better nowadays, perhaps due to 12.2 rather than 12.1 and newer packages.

In any case, it looks promising!

sudo pkg install kf5-baloo
balooctl enable
balooctl status
# Wait until everything is indexed; does it index only $HOME by default?
baloosearch "FreeBSD Foundation"
probonopd commented 2 years ago

Albert seems to have its own indexing (but it seems to be "only" file name indexing, not full text indexing - a sensible performance tradeoff?), and it's Qt based:

22:02:55 [DEBG:default] Serializing files…
22:02:58 [DEBG:default] Building inverted file index…
22:03:03 [INFO:default] Indexed 171954 files in 67196 directories.
22:11:02 [INFO:default] Start indexing files.
22:11:17 [DEBG:default] Serializing files…
22:11:20 [DEBG:default] Building inverted file index…
22:11:26 [INFO:default] Indexed 171954 files in 67196 directories.

Maybe we can use the code rsponsible for the file indexing and searching and put it into the existing search box in the Menu.


If we wanted to use Albert (rather than porting its Files plugin into our already-existing search in Menu) we would have to write a plugin for Application Bundles, taking code from

https://github.com/helloSystem/Menu/blob/aa2518c4b597b3fa77952ff55a4f638a5df13a60/src/appmenuwidget.cpp#L156-L263

and putting it into

https://github.com/albertlauncher/plugins/blob/ee55048e138028b4889d71e0574e85b2c4d69541/templateExtension/src/extension.cpp#L74


A neat idea is that Albert finds ssh connections and has text snippets.

probonopd commented 2 years ago

Deepin Linux also comes with Global Search. Need to check it out.

grahamperrin commented 2 years ago

deskutils/recoll

Recent https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=259679#c3 verified real-time indexing.

probonopd commented 2 years ago

Recoll

Hi @grahamperrin thanks for the hint.

Additional dependencies look reasonable:

New packages to be INSTALLED:
        antiword: 0.37_4 [FreeBSD]
        aspell: 0.60.8_1,1 [FreeBSD]
        catdoc: 0.95 [FreeBSD]
        chmlib: 0.40_1 [FreeBSD]
        gsfonts: 8.11_8 [FreeBSD]
        librevenge: 0.0.4_13 [FreeBSD]
        libwpd010: 0.10.3_4 [FreeBSD]
        p5-Image-ExifTool: 12.00 [FreeBSD]
        pstotext: 1.9_6 [FreeBSD]
        py38-mutagen: 1.42.0_2 [FreeBSD]
        recoll: 1.27.3_15 [FreeBSD]
        unrar: 6.02,6 [FreeBSD]
        unrtf: 0.21.10 [FreeBSD]
        xapian-core: 1.4.18,1 [FreeBSD]

Number of packages to be installed: 14

The process will require 52 MiB more space.
13 MiB to be downloaded.

I will try it out.

Indexing is running:

image

Pros

Cons?

grahamperrin commented 2 years ago

For real-time indexing: FreeBSD bug 260093 – deskutils/recoll remake X11MON an OPTIONS_DEFAULT

probonopd commented 2 years ago

What does the the X11MON option do, and do you think we'd need it so that Recoll would be suitable for helloSystem?

grahamperrin commented 2 years ago

Recent bugs.freebsd.org/bugzilla/show_bug.cgi?id=259679#c3 verified real-time indexing.

If not built with X11MON, then you'll not get real-time indexing.

Compare what you have, with https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=259679#c2.

probonopd commented 1 year ago

If we could get it working properly, it looks like baloo would be ideal.

System performance comes to a crawl after

balooctl enable

but CPU usage is minimal. Is this I/O bound? Can we throttle its I/O usage?

System performance becomes normal again only after

balooctl disable
balooctl suspend
killall baloo_file # Why is this needed?

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230726#c14 has the answer:

The system is not freezing, it runs into the vnode limit and there obtaining new vnodes is rate limited to 1 per second, which is arguably rather buggy and should be fixed.

In the meantime you can bump sysctl kern.maxvnodes

https://people.freebsd.org/~amdmi3/handbook/configtuning-kernel-limits.html

To see the current number of vnodes in use:

# sysctl vfs.numvnodes
vfs.numvnodes: 91349

To see the maximum vnodes:

# sysctl kern.maxvnodes
kern.maxvnodes: ...

In my tests,

sudo sysctl kern.maxvnodes=1000000

removed the baloo performance issue. Does this have negative side effects?

Also had to increase the number of allowable open files, I did so by a factor of 10:

sudo sysctl kern.maxfiles=3221930

Without this, I ran into

FreeBSD% balooctl resume
File Indexer resumed

(process:3305): GLib-ERROR **: 09:20:45.190: Creating pipes for GWakeup: Too many open files

Does this have negative side effects?

Now baloo_file is taking up one CPU core while indexing but the system stays operational.

I wonder if we should throttle baloo_file to take at most 50% CPU...

Runs smoothly for a while, but then I get

FreeBSD% QProcessPrivate::createPipe: Cannot create pipe 0x480a2ca100 (Too many open files)

and it does not index anything anymore.

Is there a bug which causes baloo to open but never close files?

Am I hitting https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256269?

The answer is hopefully not "disable baloo" or "index fewer files", but to get it fixed? It should be fixed in a way that it can index any arbitrary number of files.

probonopd commented 1 year ago

Stopped indexing, then removed old database with rm -rf ~/.local/share/baloo.

Trying with

only basic indexing=true

in ~/.config/baloofilerc to only search file names by now; will this improve it? Yes, this succeeds:

FreeBSD% env LANG=C balooctl status
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 555,743
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 214.31 MiB

So, why does indexing the contents of files lead to too many open files, which in turn leads to the indexing to fail?

probonopd commented 1 year ago

Seems like baloo doesn't have the best reputation among FreeBSD users... perhaps because FreeBSD and baloo are not yet properly "tuned for each other" yet?

https://forums.freebsd.org/threads/problems-with-baloo.80107/

probonopd commented 1 year ago

Integrating the results of

baloosearch -l 100 helloSystem

into Menu:

image

Not too shabby...

probonopd commented 1 year ago

To start the indexing, it must be enabled with balooctl enable. Possibly we will do this at ISO installation time in the future.

grahamperrin commented 1 year ago

Seems like baloo doesn't have the best reputation among FreeBSD users

Relatively few issues are specific to FreeBSD. Via sysutils/kf5-baloo, in Bugzilla for FreeBSD:

In Bugzilla for KDE, for Baloo, Baloo file daemon, and balooctl:

probonopd commented 1 year ago

Still seeing

FreeBSD% baloo_file
QKqueueFileSystemWatcherEngine::addPaths: open: No such file or directory

which might mean that we don't get new/changed files indexed immediately. Why?

probonopd commented 1 year ago

https://www.freebsd.org/cgi/man.cgi?rtprio

To make depend while not disturbing other machine usage: idprio 31 make depend

So we might use idprio 31 baloo_file to make it run while not disturbing other machine usage?