inspirehep / inspire-next

The INSPIRE repo.
https://inspirehep.net
GNU General Public License v3.0
59 stars 69 forks source link

global: use database for querying records instead of ElasticSearch #1857

Open jmartinm opened 7 years ago

jmartinm commented 7 years ago

In situations where some Python code needs to retrieve some records, modify them and then save them in the database we should directly use PostgreSQL capability to query the json column in records_metadata to ensure consistency, instead of first querying ES and then retrieving the records from the database.

Since the json column is defined as json (and not jsonb) we cannot make use of some operators but the ones available for json might be enough (see https://www.postgresql.org/docs/9.5/static/functions-json.html)

An abstraction should be created with an API to retrieve records from db.

jmartinm commented 7 years ago

@jacquerie can you add some pointers to parts of the code that could benefit from this change?

jacquerie commented 7 years ago

can you add some pointers to parts of the code that could benefit from this change?

For example, https://github.com/inspirehep/inspire-next/blob/2e148db2b2469f62ddc614c7625eeb1db48b8528/inspirehep/modules/records/tasks.py#L87-L129

In general, every time we import current_search_client we are doing some query so general that it can't be expressed by one of the Search classes, therefore we should be suspicious.

jacquerie commented 7 years ago

BTW we don't necessarily need to drop to Postgres' SQL dialect for this, SQLAlchemy has an abstraction on top of it (it even supports path traversal): http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html#sqlalchemy.dialects.postgresql.JSON

kaplun commented 7 years ago

For reference: the version of PostgreSQL that is currently available at CERN is 9.4.5 so we need to check which JSON operators are each time available low-level.

I.e.: https://www.postgresql.org/docs/9.4/static/functions-json.html and https://www.postgresql.org/docs/9.4/static/datatype-json.html

StellaCh commented 7 years ago

has anything been done towards this @jmartinm ?