ourresearch / journalsdb

Open database of scholarly journals
https://journalsdb.org
MIT License
10 stars 0 forks source link

missing a lot of issns especially for Elsevier #5

Closed hpiwowar closed 3 years ago

hpiwowar commented 3 years ago

I've been comparing JournalsDB to the journals data we've been using in Unsub, which we got from Unpaywall (and for the most part it got it from Crossref, by bubbling up issn data from the doi records).

A simplified dump of this comparison dump is here: https://docs.google.com/spreadsheets/d/1-9Iyudcfwr50IsqIUxHJUoLWrslVoR-XYQt7dR6tEJE/edit#gid=559023817

It includes issn_l, issn, publisher, title flattened by issn, with another field for "version" that is "previous" and "journalsdb" as you can see on the raw data tab.

It is a lot of data and too much to process easily in this format, so I cut out everything except Elsevier on 2nd tab.

I think we are missing a lot of issns, especially for Elsevier. Most issn_ls for Elsevier seem to have only one issn in journalsdb right now. See the pivot table. Most journals actually have an online and a print issn. Not all, but most. It seems we are missing those for Elsevier.

This is very important to fix fairly quickly -- it is holding up using journalsdb in unsub. Thanks!

After that, can you use this data to do various slices and dices comparing "previous" and "journalsdb" in other ways as sanity check? Check if the number of issnls is different by publisher and if so dig in and make sure not a bug, sanity check number of issns/issn_l for other publishers, etc etc. If you'd rather have it in another format let me know. Thanks!

hpiwowar commented 3 years ago

It could be worth pulling the issn->issn_l mappings from unpaywall into journalsdb as a patch supplemental table you join on as a temp measure so that Unsub can use journalsdb asap if it seems getting it from the long term source (whatever you decide that is) will take more than a day or two.

caseydm commented 3 years ago

Oh I have been working on this like crazy and believe the solution is running right now. I believe it is related to the way I implemented the filter that removes non crossref journals a couple weeks ago. So it is the same issue as the issn_l not being in the issn list. So there should be an additional ~17k ISSNs mapped to the current ISSN-Ls when this is done. Will update shortly!

caseydm commented 3 years ago

Ok the update is done. Can you take another look? For example this ISSN-L from the previous issue now has three ISSNs:

http://journalsdb.org/journals/1879-3096

hpiwowar commented 3 years ago

great! that was fast! yup will get the new data ingested into redshift and have a look tomorrow. Heather

On Sun, May 2, 2021 at 4:27 PM caseydm @.***> wrote:

Ok the update is done. Can you take another look? For example this ISSN-L from the previous issue now has three ISSNs:

http://journalsdb.org/journals/1879-3096

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ourresearch/journalsdb/issues/5#issuecomment-830925424, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABWPAO4XRMP3OTI2O3Z3JDTLXNUJANCNFSM4373E6AA .

hpiwowar commented 3 years ago

Looks great! Thanks a lot for the quick fix. Closing.

caseydm commented 3 years ago

Excellent! Happy to hear that.