Open strogonoff opened 3 years ago
Currently this notably does not cover the processes that fetch external datasets from their respective third-party locations into easier to access forms, such as GitHub repositories.
After that, converting those datasets from heterogenous formats into consistent Relaton structures (and storing them in ES for search and PGSQL for querying by reference) is taken care by the indexer, which will include pluggable adapter modules to fetch and parse each dataset.
An open issue is that requests to describe a DOI standard will require an extra network trip to DOI endpoint, meaning we can time out due none of our fault if that takes too long, and furthermore we should implement throttling on our side and proactively time out in some cases to avoid unintentionally DoSing DOI endpoint.
Here is a diagram I have arrived at after multiple discussions with Ronald to clarify the exact use cases and the kinds of datasets we deal with (pardon the hand-drawn look):
Adding it here for reference, to ensure we are on the same page.
This implies we will have two Django project codebases: