Open bnewbold opened 5 years ago
I scraped the journal names from the JURN directory website, loaded them in OpenRefine and ran the reconciliation service against Wikidata. By automatic matching best candidates and some manual matching i got 990 matches out of 3311 journals. For these i tried to add the Wikidata-ID, ISSN and ISSN-L. Here is the comma separated file exported from OpenRefine: jurn-directory-csv.txt Can you use this for some good?
Hi @Phu2, sorry for the slow reply on this. It is helpful!
I wonder what we could do to increase the matching or confirm that the un-matched results are actually missing ISSNs. Could we have OpenRefine try to reconcile against the fatcat container list instead of wikidata? There are JSON dumps here:
https://archive.org/details/fatcat_bulk_exports_2020-08-05
or I could supply a .csv file if you let me know which column fields to include.
JURN is "An organised links directory for the arts & humanities, listing selected open access or otherwise free ejournals." They list 3000-4000 such journals by name, URL, and category at http://www.jurn.org/directory/, and an additional 800 ecology titles at https://jurnsearch.wordpress.com/titles-indexed-ecology-related/.
It would be great to include these in fatcat (probably via chocula first, though could go direct via API as well), and mark them as open so they will be included in broad IA crawls for preservation. However, JURN doesn't link any persistent identifiers (eg, wikidata QID or ISSN/ISSN-L), which makes it hard to reference them anywhere without duplication.
Some brainstorms of how to go about this: