Extend database schemas

jpwahle commented 2 years ago

Is your feature request related to a problem? Please describe. Currently, everything is stored in the paper collection while the other schemas that were introduced in 2c59cbad1c66d4bf93a2a1b5123bbd9f663a49b3 have not been used. Because especially aggregate and group are expensive we want to avoid these steps by using the separate collections now.

Describe the solution you'd like Each dashboard that requires aggregation, grouping, etc. should have a separate collection (e.g., authors, venues). Also MongoDB should write data to the unused collections and map back to the paper objects. For fast filtering, each collection should have the key filter elements (e.g., year, inCitationsCount, ...) The solution should be backward compatible, so the paper collection should remain to be the same.

jpwahle commented 2 years ago

One suggestion here is to switch to a MySQL / PostgreSQL database.

Pros:

Potentially much faster
Can be hosted by GWDG

Cons:

We have to touch all schemas
Normalizing data

jpwahle commented 1 year ago

We should also think about adding more data from FatCat and Internet Archive Scholar which export everything in PostgreSQL

jpwahle / cs-insights-backend

Extend database schemas #90