kermitt2 / biblio-glutton

A high performance bibliographic information service: https://biblio-glutton.readthedocs.io
117 stars 15 forks source link

Use Openalex over Crossref? #77

Open karatekaneen opened 1 year ago

karatekaneen commented 1 year ago

We've started looking into Openalex which is an aggregate of multiple data sources (including Crossref) and I was wondering what it would take for switching to that dataset instead of the one from Crossref?

Regarding the indexing it should be relatively simple to just rewrite the logic to reconstruct the same format as the one being used today.

How about the internal db which (I assume) contains all the data that's being returned? Would that work to keep the same key but change the content completely or would you have to change something else for that as well?

Edit:

Forgot to add link: https://openalex.org/about

kermitt2 commented 1 year ago

Hi @karatekaneen

Thanks for the issue.

I think the key only would need to change. Currently both elasticsearch index and lmdb storage are using DOI as primary index, so then both are in sync via the DOI. It is planned to use an independent key in the next version in common to search index and internal storage (but I don't have a lot of time to work on it since a few months), because there are several million pubmed entries to index in addition to crossref (I wrote a PubMed/Medline bibliographical entry converter into Crossref format). So this is very similar.

Just as a remark, I don't plan to aggregate bibliographical records, but just provision all the records in the same format, kept separated to avoid aggregation errors (it will be up to the service user to use just one entry of authority, or choose her/his own aggregation logic) - however aggregation would be "build-in" with OpenAlex.