ourresearch / journalsdb

Open database of scholarly journals
https://journalsdb.org
MIT License
10 stars 0 forks source link

Status #27

Closed sckott closed 2 years ago

sckott commented 2 years ago

Just looking at this today to spot check some work for a consortium. Noticed that in the Unsub backend we use the method set_is_currently_publishing https://github.com/ourresearch/jump-api/blob/master/journalsdb.py#L158-L164 to determine if a journal is currently publishing, based on the array of data in dois_by_issued_year from here. In some cases what we have and what you have doesn't match. Maybe it doesn't matter? Seems like it does though.

For example, the following Sage ISSN's are all ones that in our database say are not currently publishing, and then I checked the journals website and the journalsdb api

issn: Currently publishing based on looking at journal website?, journalsdb api

This is not an exhaustive search, just some examples.

Overal questions:

  1. How is the status fields in the journalsdb API determined?
  2. Is it possible dois_by_issued_year can be incomplete?
sckott commented 2 years ago

i'm wondering if part of the answer is that I think there has been a focus on elsevier, meaning some of the other publisher's metadata will be less complete?

sckott commented 2 years ago

Maybe the content of dois_by_issued_year is just what Crossref has?

caseydm commented 2 years ago

The status field is kind of an incomplete feature. The way it is calculated is by checking the crossref API to see if something was published in the last six months. It calls a URL like this:

https://api.crossref.org/journals/0971-9458/works?filter=from-created-date:2021-02-01&sort=published&rows=1

A script goes through each a day and checks all of the journals with a status of 'unknown' to see if something was published recently. The reason I say it is an incomplete feature, is because the date range does not work for all journals. Some may publish longer than 6 months apart and still be considered 'publishing'. Plus, right now the status check does not change a status once it is set to 'publishing'. However, there are a lot of journals that are marked as 'ceased' due to no longer publishing and those are accurate. If you have some ideas on a better date range to use, such as a year, then let me know because I could adjust this.

caseydm commented 2 years ago

As for dois_by_issued_year, I believe Richard pulls that from crossref. But I would have to ask him to be sure. An xplenty job updates that data in journalsdb every day.

caseydm commented 2 years ago

For reference, I checked the database and out of 99k journals there are 46k marked as unknown, 52k publishing, 310 ceased.

sckott commented 2 years ago

Thanks @caseydm

If you have some ideas on a better date range to use, such as a year, then let me know because I could adjust this.

I don't have any concrete ideas. Just questions at this point. I'll think about this more.

That's good to know about "ceased". I hadn't seen ay of those yet.

As for dois_by_issued_year, I believe Richard pulls that from crossref.

I looked in Xplenty, couldn't sort out what was going on there in the job that seemed closest to what generates this. Maybe you could ask him

caseydm commented 2 years ago

@richard-orr can you shed some light on how dois_by_issued_year is calculated? Is it counting DOIs in crossref?

richard-orr commented 2 years ago

I haven't followed this all the way to dois_by_issued_year, but the DOI counts by year that are imported here are from the Unpaywall DB: https://github.com/ourresearch/journalsdb/blob/e09e5ad4f45000c9cc7f1ec74bf092586246fc3c/ingest/open_access.py#L20

It's not based on an API that enumerates them by journal - I'm taking the full list of Crossref DOIs, trying to match their ISSNs to an ISSN-L in journals DB, then counting the number of DOIs by year and ISSN-L. It could go wrong because of a lack of a published date or ISSN in Crossref or an ISSN-L mapping in journalsdb.

sckott commented 2 years ago

Thanks @richard-orr

Hmmm. I've got a lot of these (see attached), so manual website checks don't particularly scale well.

notcurrentlypublishing.csv

The way it is calculated is by checking the crossref API to see if something was published in the last six months

This is possibly more of a matter of policy than technical correctness i guess. I'll talk to Jason/Heather and see if they have any thoughts on this.

sckott commented 2 years ago

right now the status check does not change a status once it is set to 'publishing'.

@caseydm then how does a journal acquire the state of "ceased"?

caseydm commented 2 years ago

We are setting those manually, as well as importing from the publishers. For example, this spreadsheet was imported from Elsevier, so that ISSNs labeled discontinued or renamed were marked appropriately.

Taylor and Francis have some similar lists but we did not get around to implementing those yet.

sckott commented 2 years ago

Okay, thanks