Open ElliottKasoar opened 10 months ago
Note: further changes to be added following merge of https://github.com/ElliottKasoar/abcd/pull/31
As discussed with @stenczelt, ideally this will be split into 2-3 PRs (CI + OpenSearch)
Attention: Patch coverage is 78.87029%
with 101 lines
in your changes missing coverage. Please review.
Please upload report for BASE (
master@25a79ff
). Learn more about missing BASE report.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
OpenSearchDatabase
The focus of this PR is the implementation of the
OpenSearchDatabase
class inabcd/backends/atoms_opensearch.py
, designed to mirror theMongoDatabase
class inabcd/backends/atoms_pymongo.py
. Where possible, functions should behave equivalently between the two classes, although in at least once case (OpenSearchDatabase.property
), a more efficient alternative is provided (OpenSearchDatabase.count_property
).While it would be possible to use OpenSearch in combination of MongoDB (both generally, and as a relatively straightforward extension of this implementation), it seems to make more sense to use OpenSearch as the database itself, as efficiencies from OpenSearch queries are due to processing on ingestion. Having ingested data into OpenSearch, the data is stored as JSON documents, so also storing the data in MongoDB would require duplication of most, if not all, data.
Unit testing, both in mock form, similar to those currently written for MongoDB, and a more completely set of new tests, designed to connect to a live containerised database through GitHub Actions, have also been written.
Properties
A new class in
abcd/backends/atoms_properties.py
is designed to read in extra information from a CSV file, as well as infer units and the relevant structure files via a template. Unit testing for this class have also been written.Query parsing
OpenSearch queries can be relatively complex to construct, so this proposes the use of Luqum, which allows queries to be written using the Lucene Query DSL, and parsed into an Elastic/OpenSearch string query.
Parsing to enable extra information to be added in
abcd/parsers/extras.py
is largely unchanged, although I extended it slightly to allow expressions in the form of Lucene queries (e.g.key:value
).Misc
Note: The initial commits are required for later OpenSearch commits, but were written as a separate branch, as they focus on implementing poetry for package installation and dependency management, and GitHub Actions for unit testing, as well as a fixes to query parsing and pymongo for newer versions of the packages. A separate PR could, therefore, be made for these non-OpenSearch oriented changes, if desired. More general changes to legacy code due to the use of
flake8
andblack
could also be separated out, but would be more work to untangle.To do
Remaining work to be done is documented in more detail here, of which testing integration with the GUI is perhaps the most significant remaining feature to be worked on that already exists for MongoDB. However, a number of new features will also be required for PSDI, including integration with AiiDA and external databases, storage of potentials, and new metadata.