Closed sckott closed 2 years ago
Hi @sckott. You're right - the major parts are not automated, such as journal metadata, apc price, and subscription price. We have some good code in place to update pricing, considering the publishers do not change their spreadsheet structure very much. But we will need to set aside some time and go through that by downloading the current spreadsheets, running the scripts, etc.
Most of the other data is updated once per day. So the ISSNs are pulled from issn.org and crossref APIs are called. The retraction data is updated about once per week when the latest data set comes out. The DOI stats are pulled from Richard once per day. That's about it!
Thanks for clarifying!
But we will need to set aside some time and go through that by downloading the current spreadsheets, running the scripts, etc.
Can you expand on that a bit?
Sure! So for APC pricing, for example we would go download the latest spreadsheet at Elsevier here. We would need to run it locally to make sure the data is importing properly using the current script. Then upload and actually run it for the production database. But we have to do that same process with subscription pricing too for all of the top 5 publishers.
I doubt the publisher spreadsheets have changed much so they will likely run fine. But still would need to look everything over and do it carefully.
Thanks! I agree that those spreadsheets are unlikely to be updated often.
I do think it's a good idea to notify Unsub users when changes happen on our side, e.g., when a journal drops out because it changes to gold OA, changed publishers, etc. OR when APC or subscription prices change, etc. We don't really have the infrastructure setup for that, but it remains a good idea in theory
Yes makes sense to me. Next time we update pricing we should add a current_as_of field to the pricing section so someone can know if it changed recently.
current_as_of sounds great
Hi @caseydm - Someone asked about how often data is updated in Unsub, and I realized I don't know.
So, do you know how often certain fields in a response from
https://api.journalsdb.org/journals/{issn}
are updated? For example, subscription prices and apc prices. Is that a once a year thing? I assume it can't be automated since you have to pull from various websites and spreadsheets from websites, etc. Or are APC and sub prices done once, and then not updated?I'm guessing many of the fields in the response are updated on a rolling basis via querying Crossref's API. Yeah?