Open mrchristian opened 1 year ago
Apologies I'm not giving enough context to the project, and secondly, I need to break down my ToC questions - a simple pointer to your support docs will give me all the answers I'm sure.
The publishing project is by a volunteer group who have the goal of making a semantic index of all IPCC Reports. A first level project is to semantify the IPCC Glossary. We have met with IPCC and other UN agencies are they receptive to this being done. Part of the project would be to create outputs from the semantic source - one of these being a Hyperbook with user enhancements . Wikipedia/data entries etc.
See IPCC Source https://apps.ipcc.ch/glossary/ and an example Vivliostyle output - https://vivliostyle.vercel.app/#src=https://raw.githubusercontent.com/semanticClimate/glossary-demo/main/html/index.html
My questions about generating ToCs and using multiple HTML files.
Taking into account we think we want to use Vivliostyle.js and Vivliostyle CLI. We want to use CLI for PDF Bookmarks, PoD preparation, and other CLI features.
Re: Questions 1. Its seems from your documentation that a 'Web publication manifest' seems like the best route. Any recommendation to use W3C or Readium version, Readium seems seems more convenient due to its documentation and examples - but happy to use either - https://docs.vivliostyle.org/#/vivliostyle-viewer#web-publications-multi-html-documents
I had a very basic go at using a Manifest example, just to get things going:
W3C Publication Manifest
https://semanticclimate.github.io/glossary-sandbox/ipccglossary.jsonld
Render
Tomorrow I'll work on building up a W3C Manifest properly.
Would be nice if you have a pointer to a good example of a W3C Manifest example thats good for copying and building on.
Re: Questions 1. Its seems from your documentation that a 'Web publication manifest' seems like the best route. Any recommendation to use W3C or Readium version, Readium seems seems more convenient due to its documentation and examples - but happy to use either - https://docs.vivliostyle.org/#/vivliostyle-viewer#web-publications-multi-html-documents
Yes, you can use Publication Manifest to organize multiple HTML documents into one publication. (we use W3C standards unless there is a particular reason not to)
Vivliostyle.js recognizes ToC that is specified in the publication manifest. See the following sections in Publication Manifest:
A simple example of publication manifest that includes a ToC resource is below:
{
"@context": [
"https://schema.org",
"https://www.w3.org/ns/pub-context"
],
"conformsTo": "https://www.w3.org/TR/pub-manifest/",
"type": "Book",
"name": "IPCC Glossary",
"author": "IPCC",
"inLanguage": "en",
"readingOrder": [
{
"url": "index.html",
"rel": "contents"
},
"glossary.html",
"acronyms.html"
]
}
In this example, "index.html" is the ToC file.
The table of contents in the ToC file is displayed in the ToC panel of Vivliostyle Viewer.
Note that when ToC resource (the item with "rel": "contents") is not found, Vivliostyle.js use the first item of "readingOrder" as ToC resource if ToC-like elements (e.g., <nav>
) are found in that document. So if the "glossary.html" file contains table of contents with a <nav>
element,
"readingOrder": [
"glossary.html",
"acronyms.html"
]
is treated as if "rel": "contents" is specified in the "glossary.html" item, and the table of contents of glossary is displayed in the Vivliostyle Viewer's ToC panel. However, it would be better to specify "rel": "contents" explicitly when you use Publication Manifest.
You can also just use the ToC file without publication manifest (this idea is from http://glazman.org/e0/webbook.html). See the Vivliostyle Viewer document: https://docs.vivliostyle.org/#/vivliostyle-viewer#table-of-contents-in-html
When Web publication manifest does not exist, and there are links to other HTML documents in the table of contents in the specified HTML document, those documents are loaded automatically. Vivliostyle treats HTML elements that match the following CSS selector as a table of contents element:
[role=doc-toc], [role=directory], nav li, .toc, #toc
There are a few advantages of using publication manifest:
There is a simple ToC auto-generation option in Vivliostyle CLI. See the Vivliostyle CLI document: https://docs.vivliostyle.org/#/vivliostyle-cli#creating-a-table-of-contents
However this feature is very limited: it generates only one ToC link item per one HTML document. There have been a feature request to extend it to include every (or selective) heading in HTML documents. https://github.com/vivliostyle/vivliostyle-cli/issues/254
Thank you so much for your assistance here - wonderful. Apologies for my slow reply, but I got ill last week, and now only back to 'full power' as well as catching up on my 'day job' work :-)
Your answers about the ToC functions and using Vivlio CLI here are exactly what I needed right now - semanticClimate volunteer colleagues want to prepare a working publication for delgates to use at next weeks COP meeting https://unfccc.int/ UNFCCC produce the legal agreements behind COP - they have 200 such docs only as PDF. We convert to Scholarly HTML, then semantically stucture. While colleague continue to structure the HTML my I can create a publication containing all the content using a manigest and Vivlio CLI by the looks of it. I'll keep you posted.
And again thanks you - well give Vivlio a big credit :-)
BTW I got the Manifest working on the IPCC Glossary in avery basic way, will improve https://vivliostyle.vercel.app/#src=https://raw.githubusercontent.com/semanticClimate/glossary-demo/main/ipccglossary.jsonld
And now I'll start on the COP docs https://github.com/semanticClimate/unfccc
I wanted to ask about using CSS styles when I have lots of HTML files to bring together in a publication, at present its 26, but it may rise to 200.
Currently Ive used the CSS override in Vivlio, which works (excuse the style the HTML and CSS is all mixed up at moment).
Thanks
- Is it possible to get manifest to apply the style or do I need to have the style in the first Reading order document.
No, the CSS stylesheets need to be specified in each HTML document.
- How are the CSS resources used in the Manifest?
Vivliostyle.js uses CSS stylesheets specified in HTML documents, and does not use the CSS resources in the publication manifest. The CSS resources in the publication manifest are meaningless for Vivliostyle.js.
Thank you @MurakamiShinyu really appreciated. Things are moving along now well with the manifest use. I've been wanting to move onto using the manifest approach for a really long time, so happy to be able to use it at last - there's no going back now :-)
For the moment I'll append the Vivlio viewer with CSS as we are automaticallly generating the HTML files from a PDF extraction pipeline - I could have the CSS automatically linked here, but I'll do that later once were out of this development round.
Eventually there will be about 200 HTML files linked into the publication, the higher level ones in the ToC via the manifest, and the others rendered on the page in a main ToC and then in section sub-ToCs - we'll of course generate these ToC and nav files automaticall from here:
https://github.com/petermr/pyamihtml/tree/main/test/resources/unfccc/unfcccdocuments1
HI @MurakamiShinyu - we've been progressing well with the project.
I had a question about ToCs generated from the Publication Manifest and using Vivliostyle. I seem to be getting a problem of my main toc rendering at the end of a publication when I don't want it to be there.
I wondered if you could help solve the problem?
Here is the sample publication.json rendered in Vivliostyle Canary.
This is the directory in the repository where the publication is created
https://github.com/semanticClimate/cma3-test/tree/main/CMA_3
I have looked at Vivlio's multi-file examples, and W3C docs, Vivlio docs - but I cant see a solution.
Thanks
Simon
Your publication.json has "toc_ses_dec_res.html" in the "readingOrder" and "toc_toplevel_sum_ses_dec_res.html" in the "resources":
"readingOrder": [
"front_cover.html",
"imprint.html",
"toc_ses_dec_res.html",
"LEAD/split.html",
"Decision_1_CMA_3/split.html",
"Decision_2_CMA_3/split.html",
"Decision_3_CMA_3/split.html",
"Decision_4_CMA_3/split.html",
"back_cover.html"
],
"resources": [
{
"type": "LinkedResource",
"url": "toc_toplevel_sum_ses_dec_res.html",
"rel": "contents"
},
Unfortunately, Vivliostyle has a limitation that it cannot hide HTML documents listed in the "resources" in the output.
If you use "toc_ses_dec_res.html" in the "readingOrder" for "contents", you can avoid this problem:
"readingOrder": [
"front_cover.html",
"imprint.html",
{
"url": "toc_ses_dec_res.html",
"rel": "contents"
},
"LEAD/split.html",
"Decision_1_CMA_3/split.html",
"Decision_2_CMA_3/split.html",
"Decision_3_CMA_3/split.html",
"Decision_4_CMA_3/split.html",
"back_cover.html"
],
Ah great thank you. Much appreciated - I'll have a go at this now :-)
I'm just writing instructions for my colleague @petermr to auto-generate manifests and tocs from the Text and Data Miniing software Py4ami as a first trial so I'm trying to get things done properly on what will be a first trial.
We've progressed well and will soon, like next week be cleaning things out and add the CSS and modifications to the HTML we generate to at least make a proof of concept presentation to the UN Climate people.
I wanted to ask a quick question about a issue we have with the ToC reading in Vivliostyle. Apologies in advance but I think this is us messing up our HTML but before continueing to troubleshoot the issue - which will eventually solve the issue I wondered if you could take a quick look as your more knowledgeable eyes will do better than us and it might be very obvious what were getting wrong.
Essentially we're getting the whole ToC doc showing up in the Vivlio menu.
Thank you
The current TOC handling in Vivliostyle.js is not good for your HTML structure, unfortunately. Your HTML structure is like this:
<body>
<div id="sessionpre">
<img src="../images/UNlogo.jpg" alt="UN logo" id="unlogo">
<div class="sessionCode">/PA/CMA/2021/10/Add.1</div>
…
<div class="contents">
<div><span>Contents</span></div>
<div><span>Decisions adopted by the Conference of …</span></div>
<!-- TOC -->
<div class="toc">
<div>
<span>Decision</span><span>Page</span></a>
</div>
<nav role="doc-toc">
<ul>
<li>
<a href="../Decision_1_CMA_3/split.html"><span
class="descres-code">1/CMA.3</span><span
class="descres-title">Glasgow Climate Pact</span></a>
</li>
…
</ul>
</nav>
</div>
</div>
</div>
</body>
Vivliostyle.js generates the TOC box (displayed in the TOC panel in the Viewer) from the HTML document, skipping elements that are BODY's child and not containing a TOC element. See the code:
In your HTML, the BODY has only one child element <div id="sessionpre">
and that has a TOC element, so no elements are skipped. As a result, the whole BODY content is copied to the TOC box.
Also note that stylesheets are ignored in the TOC box.
If you change the HTML structure like below, the TOC box will be generated better (but not very good because of lack of style):
<body>
<div id="sessionpre">
<img src="../images/UNlogo.jpg" alt="UN logo" id="unlogo">
<div class="sessionCode">/PA/CMA/2021/10/Add.1</div>
…
</div>
<div class="contents">
<div><span>Contents</span></div>
<div><span>Decisions adopted by the Conference of …</span></div>
<!-- TOC -->
<div class="toc">
<div>
<span>Decision</span><span>Page</span></a>
</div>
<nav role="doc-toc">
<ul>
<li>
<a href="../Decision_1_CMA_3/split.html"><span
class="descres-code">1/CMA.3</span><span
class="descres-title">Glasgow Climate Pact</span></a>
</li>
…
</ul>
</nav>
</div>
</div>
</body>
I am going to fix Vivliostyle.js on these problems:
I made these into separate issues:
Amazing @MurakamiShinyu - appreciate you looking at this :-) Our HTML is an output of a Text and Data Mining process which converts PDF to HTML running a series of regex normalisation processes when dealing with a specific corpus - in this case it is the UN FCCC treaty agreements - Kyoto Protocol, Paris Agreement, then all the subsequent COP meetings which are based on these treatise. So our expercise here is to come up with a recommendation for fixes to the PDF to HTML conversion that will allow for HTML to workin Vivlio and create Publication Manifests - automagically. We are nearly complete on this prototype and then we want to present to UN FCCC and get them to organise their documents using the process going forwards. So big thank you. For demo puposes I'll clean up HTML in the way you suggest at present.
Amazing. Thank you so much :-) We were working on a work around Friday to create further DIV childs, but your fix makes it all work. I'll read up on the details etc. We can now proceed to demo the doc to the UN people, and then when we get the time integrate into the TDM pipline. We have a couple of weeks hackathon coming up in India so this will come in really useful with IPCC content too.
Is your feature request related to a problem? Please describe. Creating ToCs when using multiple HTML files - looking for support pages.
Describe the solution you'd like See a pointer to the project we're working on which is to typeset a Linked Open Data copy of the IPCC Glossary - see https://github.com/semanticClimate/glossary-sandbox/issues/1
Additional context There are a few related ToC issues: how to make the ToC main file; how to relate CSS styles to the different HTML files; how to get ToC items to appear in the the Vivlio navigator; How to get the ToCs from the different HTML files into the front ToC on the page. Sorry a lot here. I will clearly list them over on our site: https://github.com/semanticClimate/glossary-sandbox/issues/1