libAtoms / abcd

1 stars 4 forks source link

Add OpenSearch implementation #107

Open ElliottKasoar opened 10 months ago

ElliottKasoar commented 10 months ago

OpenSearchDatabase

The focus of this PR is the implementation of the OpenSearchDatabase class in abcd/backends/atoms_opensearch.py, designed to mirror the MongoDatabase class in abcd/backends/atoms_pymongo.py. Where possible, functions should behave equivalently between the two classes, although in at least once case (OpenSearchDatabase.property), a more efficient alternative is provided (OpenSearchDatabase.count_property).

While it would be possible to use OpenSearch in combination of MongoDB (both generally, and as a relatively straightforward extension of this implementation), it seems to make more sense to use OpenSearch as the database itself, as efficiencies from OpenSearch queries are due to processing on ingestion. Having ingested data into OpenSearch, the data is stored as JSON documents, so also storing the data in MongoDB would require duplication of most, if not all, data.

Unit testing, both in mock form, similar to those currently written for MongoDB, and a more completely set of new tests, designed to connect to a live containerised database through GitHub Actions, have also been written.

Properties

A new class in abcd/backends/atoms_properties.py is designed to read in extra information from a CSV file, as well as infer units and the relevant structure files via a template. Unit testing for this class have also been written.

Query parsing

OpenSearch queries can be relatively complex to construct, so this proposes the use of Luqum, which allows queries to be written using the Lucene Query DSL, and parsed into an Elastic/OpenSearch string query.

Parsing to enable extra information to be added in abcd/parsers/extras.py is largely unchanged, although I extended it slightly to allow expressions in the form of Lucene queries (e.g. key:value).

Misc

Note: The initial commits are required for later OpenSearch commits, but were written as a separate branch, as they focus on implementing poetry for package installation and dependency management, and GitHub Actions for unit testing, as well as a fixes to query parsing and pymongo for newer versions of the packages. A separate PR could, therefore, be made for these non-OpenSearch oriented changes, if desired. More general changes to legacy code due to the use of flake8 and black could also be separated out, but would be more work to untangle.

To do

Remaining work to be done is documented in more detail here, of which testing integration with the GUI is perhaps the most significant remaining feature to be worked on that already exists for MongoDB. However, a number of new features will also be required for PSDI, including integration with AiiDA and external databases, storage of potentials, and new metadata.

ElliottKasoar commented 3 weeks ago

Note: further changes to be added following merge of https://github.com/ElliottKasoar/abcd/pull/31

As discussed with @stenczelt, ideally this will be split into 2-3 PRs (CI + OpenSearch)

codecov[bot] commented 3 weeks ago

Codecov Report

Attention: Patch coverage is 78.87029% with 101 lines in your changes missing coverage. Please review.

Please upload report for BASE (master@25a79ff). Learn more about missing BASE report.

Files Patch % Lines
abcd/backends/atoms_opensearch.py 87.94% 34 Missing :warning:
abcd/backends/utils.py 39.62% 32 Missing :warning:
abcd/frontends/commandline/commands.py 48.14% 14 Missing :warning:
abcd/backends/atoms_pymongo.py 52.38% 10 Missing :warning:
abcd/backends/atoms_properties.py 89.85% 7 Missing :warning:
abcd/frontends/commandline/decorators.py 75.00% 3 Missing :warning:
abcd/model.py 66.66% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #107 +/- ## ========================================= Coverage ? 59.29% ========================================= Files ? 25 Lines ? 1646 Branches ? 0 ========================================= Hits ? 976 Misses ? 670 Partials ? 0 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.