Open sckott opened 2 years ago
Hi Scott. Yes we are doing some very basic standardization of publisher names, which you can see here: https://github.com/ourresearch/journalsdb/blob/main/ingest/journals/journals_new_journal.py#L154
I was told that Springer Publishing Company is separate from Springer Nature so they are split on purpose. The others you mentioned are outliers that should have been formatted - except for Elsevier. It looks like I need to add that one to the list. I just sent you an email discussion we had on this a while back. It includes Richard's method for normalizing publisher names along with some caveats.
Thanks and thanks for forwarding the email.
Okay, i'll assess our needs and see what further standardizing is needed and where to do it
hi casey, Working on cleaning up the publisher field in an Unsub database table for journal prices, and we talked about maybe using publisher names that journalsdb uses. However, looking at the data we ingest from journalsdb I'm not sure if names are standardized or not in journalsdb. For example, searching for the big five publisher names in the journalsdb data we ingest I see Wiley and Taylor & Francis are all set, but there's a few variants for Elsevier, SAGE and Springer.
It appears Elsevier-Churchill Livingstone is part of Elsevier, I think:
Some of the more interesting publisher names: "tanzilmultazam@umsida.ac.id", "10.15653 (Tierarztl Prax Ausg G Grosstiere Nutztiere)", "10.35977"
Currently, there's a total of 16,366 publisher names from journalsdb.
I think publishers in journalsdb are not straight from Crossref - I think Heather said that you've done some standardizing. To what extent are they cleaned up after getting them from Crossref?
Curious your thoughts on if we wanted to use standardized publisher names, what is the best source of those?