nestauk / old_nesta_daps

[archived]
MIT License
18 stars 5 forks source link

[326 precursor] Factored in utils from 326_rename #329

Closed jaklinger closed 4 years ago

jaklinger commented 4 years ago

Developed with to facilitate #326

All of changes here are required for 326_nih, which concerns the (re)collection nih data, with various improvements in terms of data quality and pipeline running time, whilst factoring nih out of the health-mosaic project, for use in eurito.

A few utils for:

Additionally:

Full set of tests add to cover the new features, plus you'll have to take my word that I've run the arxiv, crunchbase, cordis, gtr and patstat collections in dev mode and the new features haven't killed anything... (this is why we need end-to-end tests in DAPS2...). These uncovered a need for a small number of very minor updates before data is inserted into the database, for some of the oldest parts of the pipelines (crunchbase, arxiv, gtr)