petermr / climate

OpenAccess papers mined for Climate Change
Apache License 2.0
26 stars 5 forks source link

Rendering JATS/XML as HTML5 #15

Open mrchristian opened 4 years ago

mrchristian commented 4 years ago

You want to have some JATS/XML rendered as HTML5 for the Oxford XML Summer School. Can you point me to the type of source, or an example, content that needs rendering that way I can try some things out. Preferably the GitHub Pages Jekyll framework could just use the JATS as is but will have to see.

I take it we would either be wanting concatenate a series of papers from directories into one big HTML output, or create a mini website linking to papers?

petermr commented 4 years ago

Yes,

I am currently creating an XML github resource at: https://github.com/petermr/xmlopensci/blob/master/README.md Here I am going to add the XML technology and resources we need for doing open science. Hopefully this will grow.

The JATS examples are fulltext.xml at: https://github.com/petermr/climate/clim107/ https://github.com/petermr/climate/blob/master/clim107/PMC3828158/fulltext.xml e.g. https://github.com/petermr/climate/blob/master/clim107/PMC3828158/fulltext.xml

The 107 just means a subset of 107 to play with

They come from NIH/PMC and are whatever the publishers or NIH staff created. They are somewhat variable. There are about 250 tags I think and I have try to have trap unknown tags.

P.

On Sun, Sep 15, 2019 at 10:10 AM Simon Worthington notifications@github.com wrote:

You want to have some JATS/XML rendered as HTML5 for the Oxford XML Summer School. Can you point me to the type of source, or an example, content that needs rendering that way I can try some things out. Preferably the GitHub Pages Jekyll framework could just use the JATS as is but will have to see.

I take it we would either be wanting concatenate a series of papers from directories into one big HTML output, or create a mini website linking to papers?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/petermr/climate/issues/15?email_source=notifications&email_token=AAFTCS6V6PALUDGT26LDUUDQJX3W3A5CNFSM4IWZ2AHKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HLNPJXQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS2JLV66TOKUP2IOPGDQJX3W3ANCNFSM4IWZ2AHA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

mrchristian commented 4 years ago

What I need to understand is what we want to do with the directory https://github.com/petermr/climate/clim107/

Here you can see an example where I'm simply allowing GitHub's GitHub Pages to render the HTML pages that already existing in /clim107/ as a mini website:

Use this link: https://mrchristian.github.io/climate-publishing/commonest.dataTables.html

From https://github.com/mrchristian/climate-publishing/tree/master/docs

This is obviously not what we want as an end result, but instead I just wanted to show GitHub pages in action.

So... what would be a stage one version of publishing the results of a 'getpapers' process as GitHub pages?

It could simply be a homepage with list of articles (title, date) and links to HTML version, and original. All well styled.

Then after that we could move onto having an mini website generated that represented the papers and dataset in a way that exposes the different aspects of of the collection: word frequencies, dictionaries used, artifacts, essentially making a website representation of what has been already created in XML.

And we can add more features as we move along.

Thanks S

petermr commented 4 years ago

On Sun, Sep 15, 2019 at 7:59 PM Simon Worthington notifications@github.com wrote:

What I need to understand is what we want to do with the directory https://github.com/petermr/climate/clim107/

Here you can see an example where I'm simply allowing GitHub's GitHub Pages to render the HTML pages that already existing in /clim107/ as a mini website:

Use this link: https://mrchristian.github.io/climate-publishing/commonest.dataTables.html

From https://github.com/mrchristian/climate-publishing/tree/master/docs

This is obviously not what we want as an end result, but instead I just wanted to show GitHub pages in action.

data tables may not be the best place to start - there is a stylesheet - have you tried with scholarly.html? Github doesn't show HTML natively

So... what would be a stage one version of publishing the results of a 'getpapers' process as GitHub pages?

We could aggregate 5 scholarly.html as a test

It could simply be a homepage with list of articles (title, date) and links to HTML version, and original. All well styled.

Then after that we could move onto having an mini website generated that represented the papers and dataset in a way that exposes the different aspects of of the collection: word frequencies, dictionaries used, artifacts, essentially making a website representation of what has been already created in XML.

I am working on the sections

And we can add more features as we move along.

Thanks S

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petermr/climate/issues/15?email_source=notifications&email_token=AAFTCS3TGARZZQFAOSJMYU3QJ2AZFA5CNFSM4IWZ2AHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6XW2JQ#issuecomment-531590438, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS6KSRCJ6AIABO4PC5LQJ2AZFANCNFSM4IWZ2AHA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

mrchristian commented 4 years ago

So if I get it right, first off your interested in getting the articles rendered say as markdown so they can be shown in the GitHub repository and not as a separate GitHub Pages website.

I'll have a go at aggregating 5 Scholarly HTML files into one markdown and providing a ToC, then it can be displayed in the GitHub repo. Does this sound right?

PS when is your Oxford XML day?

petermr commented 4 years ago

On Mon, Sep 16, 2019 at 9:51 AM Simon Worthington notifications@github.com wrote:

So if I get it right, first off your interested in getting the articles rendered say as markdown so they can be shown in the GitHub repository and not as a separate GitHub Pages website.

I'll have a go at aggregating 5 Scholarly HTML files into one markdown and providing a ToC, then it can be displayed in the GitHub repo. Does this sound right?

Sounds right. I worked out yesterday how to create HTML sections, so we can do this for PARTS of papers which is more useful and fun, e.g. for all papers: title abstract funders

would give quite a useful product

PS when is your Oxford XML day?

Wed. at 1130 BST

I understand you and Jon Tennant know each other. We do quite a lot and I nominated him for a Shuttelwrth Flash Grant.

P.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petermr/climate/issues/15?email_source=notifications&email_token=AAFTCS6ITV6NMUK56QI4X33QJ5CIJA5CNFSM4IWZ2AHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6YPSEI#issuecomment-531691793, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCSYTJKXK5A2X4PPU66DQJ5CIJANCNFSM4IWZ2AHA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

mrchristian commented 4 years ago

OK, I'll carve out some time to get something in place. I'll see if I can pull in some help. Sections, yes very good. I can see there is something very exciting here, but still making my way up the learning curve of the processes and outputs. More soon...

petermr commented 4 years ago

I have now upgraded ami3 to extract sections. using

--sections ALL

we get

sectionList             [ABBREVIATION, ABSTRACT, ACK_FUND, APPENDIX, ARTICLE_META, ARTICLE_TITLE, CONTRIB, AUTH_CONT, BACK, BODY, CASE, CONCL, COMP_INT, DISCUSS, FINANCIAL, FIG, FRONT, INTRO, JOURNAL_META, JOURNAL_TITLE, PUBLISHER_NAME, KEYWORD, METHODS, OTHER, PMCID, REF, RESULTS, SUPPL, TABLE, SUBTITLE, TITLE]

This takes quite a while, but individual ones are quickish.