Open Treblesteph opened 7 years ago
With v1 it will be technically possible for any publisher of any size to create a sciencefair datasource as long as their publications produce JATS XML.
At first it will be a bespoke technical task, but once sciencefair v1 itself is launched, we can then move to focus on tools for creating and maintaining datasources.
Then at some later stage - v2 or later - we might start supporting other formats than JATS.
At some point soon it might become possible (using INK for example) to convert various other formats to JATS reliably, which would avoid the need to support more formats in sciencefair.
This is interesting, as any publisher that submits to PubMed is generating JATS XML, which is a lot of journals. However, that format is not generally available to the public. I wonder if even PubMed Commons provides access to the JATS XML?? If they did, it would be relatively easy to create a DAT based on Pubmed Commons .....
@rmflight pubmed central provides JATS for every paper in the PMC collection. I've already made this a datasource and will add it to v1 prior to release, but it's a bit big and cumbersome to include in the beta.
Ok, does the day store exist somewhere that the day URL could be added for interested individuals to try it out 😉
That should be dat store, stupid autocorrect
@rmflight sorry I missed your question! The PMC source(s) will be landing this week
Just to add to this - in social sciences/law etc I have never heard of JATS xml. There are other formats that are used or provided, such as that required by google scholar
@step21 AFAIK google scholar requires metadata in a range of formats:
Google Scholar supports Highwire Press tags (e.g., citation_title), Eprints tags (e.g., eprints.title), BE Press tags (e.g., bepress_citation_title), and PRISM tags (e.g., prism.title). Use Dublin Core tags (e.g., DC.title) as a last resort - they work poorly for journal papers because Dublin Core doesn't have unambiguous fields for journal title, volume, issue, and page numbers. To check that these tags are present, visit several abstracts and view their HTML source.
However, this is only for the metadata. The fulltext is provided separately - often as PDF, but also HTML or JATS XML, and they are much more flexible about this.
To my knowledge the vast majority of major publishers, and most major publishing platform software, support JATS to some extent, although they don't always make it available for download. However, my expertise is not in social science literature so I might have a blind spot.
If you can suggest some social science publishers or journals, I will look into which fulltext formats they provide so we can make sure they are included in our plans.
Well, yeah I thought we were talking about metadata - this can then refer to pdf, html or whatever. Like I said I never heard of JATS xml (though that doesn't mean anything). Maybe it is more prevalent, as you say. Mostly I worked with PKP OJS and that does not use it so far I think.
@step21 ah, well JATS is fairly flexible but usually contains full metadata and fulltext.
PKP can produce JATS and also various forms of other structured documents that we can convert easily to JATS. See for example https://github.com/pkp/ots and https://github.com/MartinPaulEve/meTypeset
JATS is present in social sciences, there are just fewer publishers that make them available AFAIK. Ubiquity tends to do so, Hindawi too I think, and a few others.
Maybe it is good to start a list of datasources to add? I can open up a new issue and start adding a few (e.g., Ubiquity, RIO journal, Hindawi, PeerJ). I also noticed MDPI doesn't use JATS but DTD Journal Publishing XML, I don't know how that'll work?
If you want, we can run through adding datasources at some point and I can add everything up into a nice doc/wiki page in due time? 😄
Just checking in post-1.0 release. Any updates on the PMC datasource?
Just minor correction: PubMed central does not provide JATS of every paper, just the so-called "open access subset". https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/
In a different format, or eventually to use a converter?