jpwahle / cs-insights-backend

API server of the cs-insights project. This is the main part of storing data and accessing an external data analysis endpoint. It uses a mongoDB instance to store everything and queries the cs-insights-prediction-endpoint to get machine learning results.
https://jpwahle.github.io/cs-insights-backend/
MIT License
7 stars 0 forks source link

Extend database schemas #90

Open jpwahle opened 2 years ago

jpwahle commented 2 years ago

Is your feature request related to a problem? Please describe. Currently, everything is stored in the paper collection while the other schemas that were introduced in 2c59cbad1c66d4bf93a2a1b5123bbd9f663a49b3 have not been used. Because especially aggregate and group are expensive we want to avoid these steps by using the separate collections now.

Describe the solution you'd like Each dashboard that requires aggregation, grouping, etc. should have a separate collection (e.g., authors, venues). Also MongoDB should write data to the unused collections and map back to the paper objects. For fast filtering, each collection should have the key filter elements (e.g., year, inCitationsCount, ...) The solution should be backward compatible, so the paper collection should remain to be the same.

jpwahle commented 2 years ago

One suggestion here is to switch to a MySQL / PostgreSQL database.

Pros:

Cons:

jpwahle commented 1 year ago

We should also think about adding more data from FatCat and Internet Archive Scholar which export everything in PostgreSQL