ourresearch / journalsdb

Open database of scholarly journals
https://journalsdb.org
MIT License
10 stars 0 forks source link

How often is data updated? #41

Closed sckott closed 2 years ago

sckott commented 2 years ago

Hi @caseydm - Someone asked about how often data is updated in Unsub, and I realized I don't know.

So, do you know how often certain fields in a response from https://api.journalsdb.org/journals/{issn} are updated? For example, subscription prices and apc prices. Is that a once a year thing? I assume it can't be automated since you have to pull from various websites and spreadsheets from websites, etc. Or are APC and sub prices done once, and then not updated?

I'm guessing many of the fields in the response are updated on a rolling basis via querying Crossref's API. Yeah?

caseydm commented 2 years ago

Hi @sckott. You're right - the major parts are not automated, such as journal metadata, apc price, and subscription price. We have some good code in place to update pricing, considering the publishers do not change their spreadsheet structure very much. But we will need to set aside some time and go through that by downloading the current spreadsheets, running the scripts, etc.

Most of the other data is updated once per day. So the ISSNs are pulled from issn.org and crossref APIs are called. The retraction data is updated about once per week when the latest data set comes out. The DOI stats are pulled from Richard once per day. That's about it!

sckott commented 2 years ago

Thanks for clarifying!

But we will need to set aside some time and go through that by downloading the current spreadsheets, running the scripts, etc.

Can you expand on that a bit?

caseydm commented 2 years ago

Sure! So for APC pricing, for example we would go download the latest spreadsheet at Elsevier here. We would need to run it locally to make sure the data is importing properly using the current script. Then upload and actually run it for the production database. But we have to do that same process with subscription pricing too for all of the top 5 publishers.

I doubt the publisher spreadsheets have changed much so they will likely run fine. But still would need to look everything over and do it carefully.

sckott commented 2 years ago

Thanks! I agree that those spreadsheets are unlikely to be updated often.

I do think it's a good idea to notify Unsub users when changes happen on our side, e.g., when a journal drops out because it changes to gold OA, changed publishers, etc. OR when APC or subscription prices change, etc. We don't really have the infrastructure setup for that, but it remains a good idea in theory

caseydm commented 2 years ago

Yes makes sense to me. Next time we update pricing we should add a current_as_of field to the pricing section so someone can know if it changed recently.

sckott commented 2 years ago

current_as_of sounds great