impresso / impresso-user-admin

Basic Django admin to manage user-related data in Impresso's Master DB.
GNU Affero General Public License v3.0
1 stars 0 forks source link

impresso-user-admin

A basic django application to manage user-related information contained in Impresso's Master DB. We use pipenv for development together with docker. Please look at the relevant sections in the documentation.

Development

Take the time to explore the .example.env file and the related ./impresso/settings.py to understand the settings that can be configured via environment variables for your specific environment. We have configured dotenv in ./impresso/base.py to allow the loading of different .env files. For example, you can use .env or .dev.env for development, and .prod.env to test production settings.

# our .dev.env file, that connects to the local redis instance
REDIS_HOST=localhost:6379
IMPRESSO_DB_HOST=localhost
IMPRESSO_DB_PORT=3306
# Then don't forget to fill all SOLR related settings accordiung to your impresso configuration
IMPRESSO_SOLR_URL=http://localhost:8983/solr/impresso
IMPRESSO_SOLR_USER=your-user-reader-only
IMPRESSO_SOLR_PASSWORD=our-user-reader-only-password
IMPRESSO_SOLR_USER_WRITE=your-user-write-allowed
IMPRESSO_SOLR_PASSWORD_WRITE=your-user-write-allowed-password
IMPRESSO_SOLR_PASSAGES_URL=http://localhost:8983/solr/impresso-tr-passages

To start the Django admin, you need to have Redis and MySQL running. You can start them by running the command docker compose up. Please note that in our YAML file, the ports for Redis and MySQL are exposed to facilitate local development and testing.

docker compose up -d --env-file=.dev.env

Then you can start the development server, e.g. with pipenv and the dev.env file:

ENV=dev pipenv run ./manage.py runserver

or with Makefile:

ENV=dev make run-dev

To start celery task manager in development with pipenv, in a new terminal:

ENV=dev pipenv run celery -A impresso worker -l info

Of course, you can also use a generic .env file on development, in this case you don't need to specify the ENV variable:

docker compose up -d
pipenv run ./manage.py runserver
# and in another terminal, to start the celery worker
pipenv run celery -A impresso worker -l info

setup with pyenv + pipenv

Follow the instruction to install pyenv, motivation on this choice can be found on hackernoon "Why you should use pyenv + Pipenv for your Python projects" and more details on pyenv on Managing Multiple Python Versions with pyenv

eval "$(pyenv init -)"
cd /path/to/impresso-user-admin/
pyenv version

The last command gives you the version of the local python. If it doesn't meet the version number specified in Pipfile, use pyenv install command:

pyenv install 3.12.4

Use pip to install Pipenv:

python -m pip install pipenv

Then run

pipenv --python 3.6.9 install

To create and activate the virtualenv. Once in the shell, you can go back with the exit command and reactivate the virtualenv simply pipenv shell

configure: setup dotenv files

Django settings.py is enriched via dotenv files, special and simple configuration files. We use a dotenv file to store sensitive settings and to store settings for a specific environment ("development" or "production"). A dotenv file is parsed when we set its prefix in the ENV environment variable, that is, .dev.env is used when we have ENV=dev:

ENV=dev pipenv run ./manage.py runserver

This command runs the development server enriching the settings file with the cofiguration stored in the .dev.env file. Please use the .example.env file as astarting point to generate specific environment configuration (e.g. prod or sandbox).

If needed (that is for local development), run:

ENV=dev pipenv run ./manage.py migrate

Useful commands

Create a new admin user in the database

ENV=dev pipenv run ./manage.py createsuperuser

Create multiple users at once, with randomly generated password.

ENV=dev pipenv run ./manage.py createaccount guestA@uni.lu guestB@uni.lu

Index a collection stored in the db using its :

ENV=dev ./manage.py synccollection test-abcd

Export query as csv using (first argument being user_id then the solr query):

ENV=dev ./manage.py exportqueryascsv 1 "content_txt_fr:\"premier ministre portugais\""

Create (or get) a collection:

ENV=dev pipenv run ./manage.py createcollection "name of the collection" my-username

Then once you get the collection id, usually a concatenation of the creator profile uid and of the slugified version of the desired name, you can add query results to the collection:

ENV=dev pipenv run python ./manage.py addtocollectionfromquery local-user_name-of-the-collection "content_txt_fr:\"premier ministre portugais\""

Index a collection from a list of tr-passages ids resulting from a solr query:

ENV=dev pipenv run python ./manage.py addtocollectionfromtrpassagesquery local-dg-abcde "cluster_id_s:tr-nobp-all-v01-c8590083914"

Stop a specific job from command line:

ENV=dev pipenv run python ./manage.py stopjob 1234

Use in production

Please check the included Dockerfile to generate your own docker image or use the docker image available on impresso dockerhub.

Test image locally:

make run

Note on collection syncronisation between indices.

Collections are simple identifiers assigned to a set of newspaper articles and stored in the search index. However, other indices (e.g. tr_passages) can be linked to a collection to allow cross-indices search. The task of creating a collection is a long running one because it uses a solr search query to filter the content items and a solr update request to add the collection tag to the various indices. Every search request is limited to settings.IMPRESSO_SOLR_EXEC_LIMIT rows (100 by default) and the number of loops is limited to the user max_allowed_loops parameter in the database and in general cannot be higher of settings.IMPRESSO_SOLR_MAX_LOOPS (100 recommended for a total of 100*100 rows default max). Set both parameters in the .env file accordingly.

The task of creating a collection is delegated to the Celery task manager and a Job instance stored in the database is assigned to the task to allow the follow-up of the task progress. The task is executed asynchronously. In the future releases, the user will be notified via email when the task is completed (still todo).

Project

The 'impresso - Media Monitoring of the Past' project is funded by the Swiss National Science Foundation (SNSF) under grant number CRSII5_173719 (Sinergia program). The project aims at developing tools to process and explore large-scale collections of historical newspapers, and at studying the impact of this new tooling on historical research practices. More information at https://impresso-project.ch.

License

Copyright (C) 2020 The impresso team. Contributors to this program include: Daniele Guido, Roman Kalyakin. This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU Affero General Public License for more details.