Open mrchristian opened 5 years ago
Yes,
I am currently creating an XML github resource at: https://github.com/petermr/xmlopensci/blob/master/README.md Here I am going to add the XML technology and resources we need for doing open science. Hopefully this will grow.
The JATS examples are fulltext.xml at: https://github.com/petermr/climate/clim107/ https://github.com/petermr/climate/blob/master/clim107/PMC3828158/fulltext.xml e.g. https://github.com/petermr/climate/blob/master/clim107/PMC3828158/fulltext.xml
The 107 just means a subset of 107 to play with
They come from NIH/PMC and are whatever the publishers or NIH staff created. They are somewhat variable. There are about 250 tags I think and I have try to have trap unknown tags.
P.
On Sun, Sep 15, 2019 at 10:10 AM Simon Worthington notifications@github.com wrote:
You want to have some JATS/XML rendered as HTML5 for the Oxford XML Summer School. Can you point me to the type of source, or an example, content that needs rendering that way I can try some things out. Preferably the GitHub Pages Jekyll framework could just use the JATS as is but will have to see.
I take it we would either be wanting concatenate a series of papers from directories into one big HTML output, or create a mini website linking to papers?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/petermr/climate/issues/15?email_source=notifications&email_token=AAFTCS6V6PALUDGT26LDUUDQJX3W3A5CNFSM4IWZ2AHKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HLNPJXQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS2JLV66TOKUP2IOPGDQJX3W3ANCNFSM4IWZ2AHA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
What I need to understand is what we want to do with the directory https://github.com/petermr/climate/clim107/
Here you can see an example where I'm simply allowing GitHub's GitHub Pages to render the HTML pages that already existing in /clim107/ as a mini website:
Use this link: https://mrchristian.github.io/climate-publishing/commonest.dataTables.html
From https://github.com/mrchristian/climate-publishing/tree/master/docs
This is obviously not what we want as an end result, but instead I just wanted to show GitHub pages in action.
So... what would be a stage one version of publishing the results of a 'getpapers' process as GitHub pages?
It could simply be a homepage with list of articles (title, date) and links to HTML version, and original. All well styled.
Then after that we could move onto having an mini website generated that represented the papers and dataset in a way that exposes the different aspects of of the collection: word frequencies, dictionaries used, artifacts, essentially making a website representation of what has been already created in XML.
And we can add more features as we move along.
Thanks S
On Sun, Sep 15, 2019 at 7:59 PM Simon Worthington notifications@github.com wrote:
What I need to understand is what we want to do with the directory https://github.com/petermr/climate/clim107/
Here you can see an example where I'm simply allowing GitHub's GitHub Pages to render the HTML pages that already existing in /clim107/ as a mini website:
Use this link: https://mrchristian.github.io/climate-publishing/commonest.dataTables.html
From https://github.com/mrchristian/climate-publishing/tree/master/docs
This is obviously not what we want as an end result, but instead I just wanted to show GitHub pages in action.
data tables may not be the best place to start - there is a stylesheet - have you tried with scholarly.html? Github doesn't show HTML natively
So... what would be a stage one version of publishing the results of a 'getpapers' process as GitHub pages?
We could aggregate 5 scholarly.html as a test
It could simply be a homepage with list of articles (title, date) and links to HTML version, and original. All well styled.
Then after that we could move onto having an mini website generated that represented the papers and dataset in a way that exposes the different aspects of of the collection: word frequencies, dictionaries used, artifacts, essentially making a website representation of what has been already created in XML.
I am working on the sections
And we can add more features as we move along.
Thanks S
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petermr/climate/issues/15?email_source=notifications&email_token=AAFTCS3TGARZZQFAOSJMYU3QJ2AZFA5CNFSM4IWZ2AHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6XW2JQ#issuecomment-531590438, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS6KSRCJ6AIABO4PC5LQJ2AZFANCNFSM4IWZ2AHA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
So if I get it right, first off your interested in getting the articles rendered say as markdown so they can be shown in the GitHub repository and not as a separate GitHub Pages website.
I'll have a go at aggregating 5 Scholarly HTML files into one markdown and providing a ToC, then it can be displayed in the GitHub repo. Does this sound right?
PS when is your Oxford XML day?
On Mon, Sep 16, 2019 at 9:51 AM Simon Worthington notifications@github.com wrote:
So if I get it right, first off your interested in getting the articles rendered say as markdown so they can be shown in the GitHub repository and not as a separate GitHub Pages website.
I'll have a go at aggregating 5 Scholarly HTML files into one markdown and providing a ToC, then it can be displayed in the GitHub repo. Does this sound right?
Sounds right. I worked out yesterday how to create HTML sections, so we can do this for PARTS of papers which is more useful and fun, e.g. for all papers: title abstract funders
would give quite a useful product
PS when is your Oxford XML day?
Wed. at 1130 BST
I understand you and Jon Tennant know each other. We do quite a lot and I nominated him for a Shuttelwrth Flash Grant.
P.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petermr/climate/issues/15?email_source=notifications&email_token=AAFTCS6ITV6NMUK56QI4X33QJ5CIJA5CNFSM4IWZ2AHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6YPSEI#issuecomment-531691793, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCSYTJKXK5A2X4PPU66DQJ5CIJANCNFSM4IWZ2AHA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
OK, I'll carve out some time to get something in place. I'll see if I can pull in some help. Sections, yes very good. I can see there is something very exciting here, but still making my way up the learning curve of the processes and outputs. More soon...
I have now upgraded ami3
to extract sections.
using
--sections ALL
we get
sectionList [ABBREVIATION, ABSTRACT, ACK_FUND, APPENDIX, ARTICLE_META, ARTICLE_TITLE, CONTRIB, AUTH_CONT, BACK, BODY, CASE, CONCL, COMP_INT, DISCUSS, FINANCIAL, FIG, FRONT, INTRO, JOURNAL_META, JOURNAL_TITLE, PUBLISHER_NAME, KEYWORD, METHODS, OTHER, PMCID, REF, RESULTS, SUPPL, TABLE, SUBTITLE, TITLE]
This takes quite a while, but individual ones are quickish.
You want to have some JATS/XML rendered as HTML5 for the Oxford XML Summer School. Can you point me to the type of source, or an example, content that needs rendering that way I can try some things out. Preferably the GitHub Pages Jekyll framework could just use the JATS as is but will have to see.
I take it we would either be wanting concatenate a series of papers from directories into one big HTML output, or create a mini website linking to papers?