Ability to add papers by smaller publishers.

Treblesteph commented 7 years ago

In a different format, or eventually to use a converter?

blahah commented 7 years ago

With v1 it will be technically possible for any publisher of any size to create a sciencefair datasource as long as their publications produce JATS XML.

At first it will be a bespoke technical task, but once sciencefair v1 itself is launched, we can then move to focus on tools for creating and maintaining datasources.

Then at some later stage - v2 or later - we might start supporting other formats than JATS.

At some point soon it might become possible (using INK for example) to convert various other formats to JATS reliably, which would avoid the need to support more formats in sciencefair.

rmflight commented 7 years ago

This is interesting, as any publisher that submits to PubMed is generating JATS XML, which is a lot of journals. However, that format is not generally available to the public. I wonder if even PubMed Commons provides access to the JATS XML?? If they did, it would be relatively easy to create a DAT based on Pubmed Commons .....

blahah commented 7 years ago

@rmflight pubmed central provides JATS for every paper in the PMC collection. I've already made this a datasource and will add it to v1 prior to release, but it's a bit big and cumbersome to include in the beta.

rmflight commented 7 years ago

Ok, does the day store exist somewhere that the day URL could be added for interested individuals to try it out 😉

rmflight commented 7 years ago

That should be dat store, stupid autocorrect

blahah commented 7 years ago

@rmflight sorry I missed your question! The PMC source(s) will be landing this week

step21 commented 7 years ago

Just to add to this - in social sciences/law etc I have never heard of JATS xml. There are other formats that are used or provided, such as that required by google scholar

blahah commented 7 years ago

@step21 AFAIK google scholar requires metadata in a range of formats:

Google Scholar supports Highwire Press tags (e.g., citation_title), Eprints tags (e.g., eprints.title), BE Press tags (e.g., bepress_citation_title), and PRISM tags (e.g., prism.title). Use Dublin Core tags (e.g., DC.title) as a last resort - they work poorly for journal papers because Dublin Core doesn't have unambiguous fields for journal title, volume, issue, and page numbers. To check that these tags are present, visit several abstracts and view their HTML source.

https://scholar.google.com/intl/en/scholar/inclusion.html#indexing

However, this is only for the metadata. The fulltext is provided separately - often as PDF, but also HTML or JATS XML, and they are much more flexible about this.

To my knowledge the vast majority of major publishers, and most major publishing platform software, support JATS to some extent, although they don't always make it available for download. However, my expertise is not in social science literature so I might have a blind spot.

If you can suggest some social science publishers or journals, I will look into which fulltext formats they provide so we can make sure they are included in our plans.

step21 commented 7 years ago

Well, yeah I thought we were talking about metadata - this can then refer to pdf, html or whatever. Like I said I never heard of JATS xml (though that doesn't mean anything). Maybe it is more prevalent, as you say. Mostly I worked with PKP OJS and that does not use it so far I think.

blahah commented 7 years ago

@step21 ah, well JATS is fairly flexible but usually contains full metadata and fulltext.

PKP can produce JATS and also various forms of other structured documents that we can convert easily to JATS. See for example https://github.com/pkp/ots and https://github.com/MartinPaulEve/meTypeset

chartgerink commented 7 years ago

JATS is present in social sciences, there are just fewer publishers that make them available AFAIK. Ubiquity tends to do so, Hindawi too I think, and a few others.

Example with direct link to XML

chartgerink commented 7 years ago

Maybe it is good to start a list of datasources to add? I can open up a new issue and start adding a few (e.g., Ubiquity, RIO journal, Hindawi, PeerJ). I also noticed MDPI doesn't use JATS but DTD Journal Publishing XML, I don't know how that'll work?

If you want, we can run through adding datasources at some point and I can add everything up into a nice doc/wiki page in due time? 😄

smdabdoub commented 6 years ago

Just checking in post-1.0 release. Any updates on the PMC datasource?

CAYdenberg commented 6 years ago

Just minor correction: PubMed central does not provide JATS of every paper, just the so-called "open access subset". https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/

sciencefair-land / sciencefair

Ability to add papers by smaller publishers. #38