NOTE: This is a prototype which is being heavily evolved to be more generic, more robust and have much more functionalities.
This API is currently in development. Read Search-a-licious roadmap architecture notes to understand where we are headed.
There is a Lit/JS Frontend and a Python (FastAPI) Backend (current README) located on this repository.
The main file is api.py
, and the schema is in models/product.py
.
A CLI is available to perform common tasks.
Note: the Makefile will align the user id with your own uid for a smooth editing experience.
Before running the services, you need to make sure that your system mmap count is high enough for Elasticsearch to run. You can do this by running:
sudo sysctl -w vm.max_map_count=262144
Then build the services with:
make build
Start docker:
docker compose up -d
[!NOTE] You may encounter a permission error if your user is not part of the
docker
group, in which case you should either add it or modify the Makefile to prefixsudo
to all docker and docker compose commands. Update container crash because we are not connected to any Redis
Docker spins up:
You will then need to import from a JSONL dump (see instructions below).
For Linux and macOS users, You can follow our tutorial to install direnv.[^winEnvrc]
Get your user id and group id by running id -u
and id -g
in your terminal.
Add a .envrc
file at the root of the project with the following content:
export USER_GID=<your_user_gid>
export USER_UID=<your_user_uid>
export CONFIG_PATH=data/config/openfoodfacts.yml
export OFF_API_URL=https://world.openfoodfacts.org
export ALLOWED_ORIGINS='http://localhost,http://127.0.0.1,https://*.openfoodfacts.org,https://*.openfoodfacts.net'
[^winEnvrc]: For Windows users, the .envrc is only taken into account by the make
commands.
You can follow the following tutorial to install pre-commit on your machine.
Be sure that your system mmap count is high enough for Elasticsearch to run. You can do this by running:
sudo sysctl -w vm.max_map_count=262144
To make the change permanent, you need to add a line vm.max_map_count=262144
to the /etc/sysctl.conf
file and run the command sudo sysctl -p
to apply the changes.
This will ensure that the modified value of vm.max_map_count
is retained even after a system reboot. Without this step, the value will be reset to its default value after a reboot.
Now you can run the project with Docker docker compose up
.
After that run the following command on another shell to compile the project: make tsc_watch
.
Do this for next installation steps and to run the project.
make import-taxonomies
# get some sample data
curl https://world.openfoodfacts.org/data/exports/products.random-modulo-10000.jsonl.gz --output data/products.random-modulo-10000.jsonl.gz
gzip -d data/products.random-modulo-10000.jsonl.gz
# we skip updates because we are not connected to any redis
make import-dataset filepath='products.random-modulo-10000.jsonl' args='--skip-updates'
Now you can go to :
To look into the data, you may use elasticvue, going to http://127.0.0.1:8080/ and reaching http://127.0.0.1:9200 cluster: docker-cluster
(unless you changed env variables).
This repo uses pre-commit to enforce code styling, etc. To use it:
pre-commit install
To run tests without committing:
pre-commit run
To debug the backend app:
docker compose stop api
docker compose run --rm --use-aliases api uvicorn app.api:app --proxy-headers --host 0.0.0.0 --port 8000 --reload
[^use_aliases]To import data from the JSONL export, download the dataset in the data
folder, then run:
make import-dataset filepath='products.jsonl.gz'
If you get errors, try adding more RAM (12GB works well if you have that spare), or slow down the indexing process by setting num_processes
to 1 in the command above.
Typical import time is 45-60 minutes.
If you want to skip updates (eg. because you don't have a Redis installed),
use make import-dataset filepath='products.jsonl.gz' args="--skip-updates"
You should also import taxonomies:
make import-taxonomies
In your index configuration, you can add scripts, used for personalized sorting.
For example:
scripts:
personal_score:
# see https://www.elastic.co/guide/en/elasticsearch/painless/8.14/index.html
lang: painless
# the script source, here a trivial example
source: |-
doc[params["preferred_field"]].size > 0 ? doc[params["preferred_field"]].value : (doc[params["secondary_field"]].size > 0 ? doc[params["secondary_field"]].value : 0)
# gives an example of parameters
params:
preferred_field: "field1"
secondary_field: "field2"
# more non editable parameters, can be easier than to declare constants in the script
static_params:
param1 : "foo"
You then have to import this script in your elasticsearch instance, by running:
docker compose run --rm api python -m app sync-scripts
You can now use it with the POST API:
curl -X POST http://127.0.0.1:8000/search \
-H "Content-type: application/json" \
-d '{"q": "", "sort_by": "personal_score", "sort_params": {"preferred_field": "nova_group", "secondary_field": "last_modified_t"}}
Or you can now use it inside a the sort web-component:
<searchalicious-sort auto-refresh>
<searchalicious-sort-script script="personal_score" parameters='{"preferred_field": "nova_group", "secondary_field": "last_modified_t"}}'>
Personal preferences
</searchalicious-sort-script>
</searchalicious-sort>
even better the parameters might be retrieved for local storage.
This project has received financial support from the NGI Search (New Generation Internet) program, funded by the 🇪🇺 European Commission. Thank you for supporting Open-Souce, Open Data, and the Commons.