Closed sckott closed 2 years ago
i'm wondering if part of the answer is that I think there has been a focus on elsevier, meaning some of the other publisher's metadata will be less complete?
Maybe the content of dois_by_issued_year
is just what Crossref has?
The status field is kind of an incomplete feature. The way it is calculated is by checking the crossref API to see if something was published in the last six months. It calls a URL like this:
A script goes through each a day and checks all of the journals with a status of 'unknown' to see if something was published recently. The reason I say it is an incomplete feature, is because the date range does not work for all journals. Some may publish longer than 6 months apart and still be considered 'publishing'. Plus, right now the status check does not change a status once it is set to 'publishing'. However, there are a lot of journals that are marked as 'ceased' due to no longer publishing and those are accurate. If you have some ideas on a better date range to use, such as a year, then let me know because I could adjust this.
As for dois_by_issued_year, I believe Richard pulls that from crossref. But I would have to ask him to be sure. An xplenty job updates that data in journalsdb every day.
For reference, I checked the database and out of 99k journals there are 46k marked as unknown, 52k publishing, 310 ceased.
Thanks @caseydm
If you have some ideas on a better date range to use, such as a year, then let me know because I could adjust this.
I don't have any concrete ideas. Just questions at this point. I'll think about this more.
That's good to know about "ceased". I hadn't seen ay of those yet.
As for dois_by_issued_year, I believe Richard pulls that from crossref.
I looked in Xplenty, couldn't sort out what was going on there in the job that seemed closest to what generates this. Maybe you could ask him
@richard-orr can you shed some light on how dois_by_issued_year is calculated? Is it counting DOIs in crossref?
I haven't followed this all the way to dois_by_issued_year, but the DOI counts by year that are imported here are from the Unpaywall DB: https://github.com/ourresearch/journalsdb/blob/e09e5ad4f45000c9cc7f1ec74bf092586246fc3c/ingest/open_access.py#L20
It's not based on an API that enumerates them by journal - I'm taking the full list of Crossref DOIs, trying to match their ISSNs to an ISSN-L in journals DB, then counting the number of DOIs by year and ISSN-L. It could go wrong because of a lack of a published date or ISSN in Crossref or an ISSN-L mapping in journalsdb.
Thanks @richard-orr
Hmmm. I've got a lot of these (see attached), so manual website checks don't particularly scale well.
The way it is calculated is by checking the crossref API to see if something was published in the last six months
This is possibly more of a matter of policy than technical correctness i guess. I'll talk to Jason/Heather and see if they have any thoughts on this.
right now the status check does not change a status once it is set to 'publishing'.
@caseydm then how does a journal acquire the state of "ceased"?
We are setting those manually, as well as importing from the publishers. For example, this spreadsheet was imported from Elsevier, so that ISSNs labeled discontinued or renamed were marked appropriately.
Taylor and Francis have some similar lists but we did not get around to implementing those yet.
Okay, thanks
Just looking at this today to spot check some work for a consortium. Noticed that in the Unsub backend we use the method
set_is_currently_publishing
https://github.com/ourresearch/jump-api/blob/master/journalsdb.py#L158-L164 to determine if a journal is currently publishing, based on the array of data indois_by_issued_year
from here. In some cases what we have and what you have doesn't match. Maybe it doesn't matter? Seems like it does though.For example, the following Sage ISSN's are all ones that in our database say are not currently publishing, and then I checked the journals website and the journalsdb api
issn: Currently publishing based on looking at journal website?, journalsdb api
This is not an exhaustive search, just some examples.
Overal questions:
status
fields in the journalsdb API determined?dois_by_issued_year
can be incomplete?