traverseda / iiab-searchServices

Search services I'm writing for my personal data-archive/internet-in-a-box
GNU Affero General Public License v3.0
3 stars 0 forks source link

Early pre-release

WIP, expect breaking changes.

Home page Results page

Simple website search using flask and sqlite's full text search. Originally I was going to use OpenSemanticSearch, but I found it far too difficult to get working on armbian, so I settled on simple text search instead.

This is intended to be the search engine for my personal archive, but it might be a good fit for internet-in-a-box as well.

Goals

We use whoosh as our search engine, and huey with the sqlite backend as a task-queue, although it could also use redis for distributed crawling.

There are a few things that are notably absent, namely

Alternatives

There are some more mature alternatives for intranet search that you might want to investigate. For the most part these will have better performance but also have higher system requirments.

Technology

We use textract to extract text from more complicated dataformats, you can see how that works here.

Usage

Quickstart

pip3 install --user pipx
python3 -m pipx ensurepath
pipx install git+https://github.com/traverseda/iiab-searchServices.git --force
lcars-tasks & #Run tasks in the background
lcars-server & #Run web server in the background
lcars-cli index http://someurl

Development

I will try to be pretty responsive, so if you have any questions feel free to open an issue, it's not just for bug/feature-requests.

If you're writing a new app you can integrate with the api to update indexes. Otherwise you probably want to use things like cron (timers) and incron (run command when files change) and the command line interface.

Developing

ToDo: documentation on how to extend this with custom data extractors ToDo: Allow removing urls that no longer exist (and all their descendents)