openfoodfacts / search-a-licious

🍊🔎 A pluggable search service for large collections of objects (like Open Food Facts)
https://search.openfoodfacts.org
GNU Affero General Public License v3.0
6 stars 4 forks source link
elasticsearch food search-engine

Search-a-licious

NOTE: This is a prototype which is being heavily evolved to be more generic, more robust and have much more functionalities.

This API is currently in development. Read Search-a-licious roadmap architecture notes to understand where we are headed.

Organization

There is a Lit/JS Frontend and a Python (FastAPI) Backend (current README) located on this repository.

Backend

The main file is api.py, and the schema is in models/product.py.

A CLI is available to perform common tasks.

Running the project on your machine

Note: the Makefile will align the user id with your own uid for a smooth editing experience.

Before running the services, you need to make sure that your system mmap count is high enough for Elasticsearch to run. You can do this by running:

sudo sysctl -w vm.max_map_count=262144

Then build the services with:

make build

Start docker:

docker compose up -d

[!NOTE] You may encounter a permission error if your user is not part of the docker group, in which case you should either add it or modify the Makefile to prefix sudo to all docker and docker compose commands. Update container crash because we are not connected to any Redis

Docker spins up:

You will then need to import from a JSONL dump (see instructions below).

Development

Pre-requisites

Installing Docker
Installing Direnv

For Linux and macOS users, You can follow our tutorial to install direnv.[^winEnvrc]

Get your user id and group id by running id -u and id -g in your terminal. Add a .envrc file at the root of the project with the following content:

export USER_GID=<your_user_gid>
export USER_UID=<your_user_uid>

export CONFIG_PATH=data/config/openfoodfacts.yml
export OFF_API_URL=https://world.openfoodfacts.org
export ALLOWED_ORIGINS='http://localhost,http://127.0.0.1,https://*.openfoodfacts.org,https://*.openfoodfacts.net' 

[^winEnvrc]: For Windows users, the .envrc is only taken into account by the make commands.

Installing Pre-commit

You can follow the following tutorial to install pre-commit on your machine.

Installing mmap

Be sure that your system mmap count is high enough for Elasticsearch to run. You can do this by running:

sudo sysctl -w vm.max_map_count=262144

To make the change permanent, you need to add a line vm.max_map_count=262144 to the /etc/sysctl.conf file and run the command sudo sysctl -p to apply the changes. This will ensure that the modified value of vm.max_map_count is retained even after a system reboot. Without this step, the value will be reset to its default value after a reboot.

Running your local instance using Docker

Now you can run the project with Docker docker compose up. After that run the following command on another shell to compile the project: make tsc_watch. Do this for next installation steps and to run the project.

Exploring Elasticsearch data

Importing data into your development environment

Pages

Now you can go to :

To look into the data, you may use elasticvue, going to http://127.0.0.1:8080/ and reaching http://127.0.0.1:9200 cluster: docker-cluster (unless you changed env variables).

Pre-Commit

This repo uses pre-commit to enforce code styling, etc. To use it:

pre-commit install

To run tests without committing:

pre-commit run

Debugging the backend app

To debug the backend app:

Running the full import (45-60 min)

To import data from the JSONL export, download the dataset in the data folder, then run:

make import-dataset filepath='products.jsonl.gz'

If you get errors, try adding more RAM (12GB works well if you have that spare), or slow down the indexing process by setting num_processes to 1 in the command above.

Typical import time is 45-60 minutes.

If you want to skip updates (eg. because you don't have a Redis installed), use make import-dataset filepath='products.jsonl.gz' args="--skip-updates"

You should also import taxonomies:

make import-taxonomies

Using sort script

In your index configuration, you can add scripts, used for personalized sorting.

For example:

    scripts:
      personal_score:
        # see https://www.elastic.co/guide/en/elasticsearch/painless/8.14/index.html
        lang: painless
        # the script source, here a trivial example
        source: |-
          doc[params["preferred_field"]].size > 0 ? doc[params["preferred_field"]].value : (doc[params["secondary_field"]].size > 0 ? doc[params["secondary_field"]].value : 0)
        # gives an example of parameters
        params:
          preferred_field: "field1"
          secondary_field: "field2"
        # more non editable parameters, can be easier than to declare constants in the script
        static_params:
          param1 : "foo"

You then have to import this script in your elasticsearch instance, by running:

docker compose run --rm api python -m app sync-scripts

You can now use it with the POST API:

curl -X POST http://127.0.0.1:8000/search \
  -H "Content-type: application/json" \
  -d '{"q": "", "sort_by": "personal_score", "sort_params": {"preferred_field": "nova_group", "secondary_field": "last_modified_t"}}

Or you can now use it inside a the sort web-component:

  <searchalicious-sort auto-refresh>
    <searchalicious-sort-script script="personal_score" parameters='{"preferred_field": "nova_group", "secondary_field": "last_modified_t"}}'>
      Personal preferences
    </searchalicious-sort-script>
  </searchalicious-sort>

even better the parameters might be retrieved for local storage.

Thank you to our sponsors !

This project has received financial support from the NGI Search (New Generation Internet) program, funded by the 🇪🇺 European Commission. Thank you for supporting Open-Souce, Open Data, and the Commons.

NGI-search logo European flag