manubot / rootstock

Clone me to create your Manubot manuscript
https://manubot.github.io/rootstock/
Other
453 stars 178 forks source link

Add librsvg as dependency #489

Closed miltondp closed 1 year ago

miltondp commented 1 year ago

This PR is related to issue #488. It adds librsvg as a dependency so pandoc converts SVG images to a format supported by Word (.docx).

I think this was not introduced here, but when I build the rootstock manuscript with BUILD_DOCX=true, opening the docx file with OpenOffice Writer shows this error:

image

Clicking on "yes" allows to open the document. However, the SVG file (vector.svg) is not shown in the Word document. This is fixed if the file is downloaded and referenced locally. I thought it might be important to mention how to do this in the documentation, so I also added a note about this.

AppVeyorBot commented 1 year ago

AppVeyor build 1.0.320 for commit ee76e0f244b35160644d8ad2fac4906ece459935 is now complete.

Found 51 potential spelling error(s). Preview:content/02.delete-me.md:44:adipiscing content/02.delete-me.md:44:aliqua content/02.delete-me.md:44:amet content/02.delete-me.md:44:consectetur content/02.delete-me.md:44:dolore content/02.delete-me.md:44:eiusmod content/02.delete-me.md:44:elit content/02.delete-me.md:44:incididunt content/02.delete-me.md:44:ipsum content/02.delete-me.md:44:labore content/02.delete-me.md:44:Lorem content/02.delete-me.md:44:magna content/02...
The rendered manuscript from this build is temporarily available for download at:

AppVeyorBot commented 1 year ago

AppVeyor build 1.0.321 for commit 60717829fba95d0a70ea8e5f71761e8719dc88cd is now complete.

Found 51 potential spelling error(s). Preview:content/02.delete-me.md:44:adipiscing content/02.delete-me.md:44:aliqua content/02.delete-me.md:44:amet content/02.delete-me.md:44:consectetur content/02.delete-me.md:44:dolore content/02.delete-me.md:44:eiusmod content/02.delete-me.md:44:elit content/02.delete-me.md:44:incididunt content/02.delete-me.md:44:ipsum content/02.delete-me.md:44:labore content/02.delete-me.md:44:Lorem content/02.delete-me.md:44:magna content/02...
The rendered manuscript from this build is temporarily available for download at:

dhimmel commented 1 year ago

Here's the text of the error message for searchability:

An error occurred during opening the file. This may be caused by incorrect file contents.
The error details are:
SAXException:
[word/document.xml line 1]: Opening and ending tag mismatch: t line 1 and p
./sax/source/fastparser/fastparser.cxx:619
Proceeding with import may cause data loss or corruption, and application may become unstable or crash.
Do you want to ignore the error and attempt to continue loading the file?

the SVG file (vector.svg) is not shown in the Word document. This is fixed if the file is downloaded and referenced locally.

Is the hypothesis here that Pandoc's SVG to raster conversion fails via rsvg-convert when the image is a URL as opposed to local file path? Is there anything in the Pandoc stderr log output that notes that the SVG conversion failed?

agitter commented 1 year ago

I can test the conda environment and local build on Windows but won't get to it immediately.

miltondp commented 1 year ago

Is the hypothesis here that Pandoc's SVG to raster conversion fails via rsvg-convert when the image is a URL as opposed to local file path? Is there anything in the Pandoc stderr log output that notes that the SVG conversion failed?

Yes, that's the hypothesis. I couldn't see anything in the output (see below) that helps support the hypothesis.

Also, if I remove the ?sanitize=true parameter from the image URL it does not work either.

$ BUILD_DOCX=true BUILD_PDF=true bash build/build.sh && manubot webpage 
Retrieving and processing reference metadata
## INFO
Manuscript content parts:
00.front-matter
01.abstract
02.delete-me
90.back-matter
## INFO
No explicit manuscript date provided. Dating manuscript based on the current datetime: 2023-03-10T19:58:00.571845+00:00 (in the UTC timezone)
## INFO
Generated manscript stats:
{
  "word_count": 1579
}
## WARNING
Template variable warning: 'dict object' has no attribute 'corresponding'
## WARNING
Template variable warning: 'dict object' has no attribute 'corresponding'
Exporting HTML manuscript
[INFO] Loaded build/themes/default.html from build/themes/default.html
[INFO] Loaded build/plugins/core.html from build/plugins/core.html
[INFO] Loaded build/plugins/accordion.html from build/plugins/accordion.html
[INFO] Loaded build/plugins/anchors.html from build/plugins/anchors.html
[INFO] Loaded build/plugins/attributes.html from build/plugins/attributes.html
[INFO] Loaded build/plugins/jump-to-first.html from build/plugins/jump-to-first.html
[INFO] Loaded build/plugins/lightbox.html from build/plugins/lightbox.html
[INFO] Loaded build/plugins/link-highlight.html from build/plugins/link-highlight.html
[INFO] Loaded build/plugins/table-of-contents.html from build/plugins/table-of-contents.html
[INFO] Loaded build/plugins/tooltips.html from build/plugins/tooltips.html
[INFO] Loaded build/plugins/analytics.html from build/plugins/analytics.html
[INFO] Loaded build/plugins/hypothesis.html from build/plugins/hypothesis.html
[INFO] Loaded build/plugins/mathjax.html from build/plugins/mathjax.html
[INFO] Running filter pandoc-fignos
[INFO] Completed filter pandoc-fignos in 20 ms
[INFO] Running filter pandoc-eqnos

pandoc-eqnos: Wrote the following blocks to header-includes.  If you
use pandoc's --include-in-header option then you will need to manually
include these yourself.

    <!-- pandoc-eqnos: equation style -->
    <style>
      .eqnos { display: inline-block; position: relative; width: 100%; }
      .eqnos br { display: none; }
      .eqnos-number { position: absolute; right: 0em; top: 50%; line-height: 0; }
    </style>
[INFO] Completed filter pandoc-eqnos in 6 ms
[INFO] Running filter pandoc-tablenos
[INFO] Completed filter pandoc-tablenos in 6 ms
[INFO] Running filter pandoc-manubot-cite
[INFO] Completed filter pandoc-manubot-cite in 19 ms
[INFO] Loaded build/assets/style.csl from build/assets/style.csl
Exporting PDF manuscript using Docker + Athena
ATTENTION: default value of option force_s3tc_enable overridden by environment.
Converted 'file:///converted/manuscript.html' to PDF: 'manuscript.pdf'
PDF Conversion: 2253.848ms
Exporting Word Docx manuscript
[INFO] Running filter pandoc-fignos
[INFO] Completed filter pandoc-fignos in 9 ms
[INFO] Running filter pandoc-eqnos
[INFO] Completed filter pandoc-eqnos in 6 ms
[INFO] Running filter pandoc-tablenos
[INFO] Completed filter pandoc-tablenos in 7 ms
[INFO] Running filter pandoc-manubot-cite
[INFO] Completed filter pandoc-manubot-cite in 17 ms
[INFO] Loaded build/assets/style.csl from build/assets/style.csl
[INFO] Loaded images/orcid.svg from content/images/orcid.svg
[INFO] Loaded images/github.svg from content/images/github.svg
[INFO] Loaded images/twitter.svg from content/images/twitter.svg
[INFO] Loaded images/mastodon.svg from content/images/mastodon.svg
[INFO] Fetching https://github.com/manubot/resources/raw/15493970f8882fce22bef829619d3fb37a613ba5/test/square.png...
[INFO] Fetching https://github.com/manubot/resources/raw/15493970f8882fce22bef829619d3fb37a613ba5/test/wide.png...
[INFO] Fetching https://github.com/manubot/resources/raw/15493970f8882fce22bef829619d3fb37a613ba5/test/tall.png...
[INFO] Fetching https://raw.githubusercontent.com/manubot/resources/main/test/vector.svg?sanitize=true...
[INFO] Not rendering RawInline (Format "html") "<small>"
[INFO] Not rendering RawInline (Format "html") "<em>"
[INFO] Not rendering RawInline (Format "html") "</em>"
[INFO] Not rendering RawInline (Format "html") "</small>"
[INFO] Not rendering RawInline (Format "html") "<br>"
[INFO] Not rendering RawInline (Format "html") "<br>"
[INFO] Not rendering RawInline (Format "html") "<small>"
[INFO] Not rendering RawInline (Format "html") "</small>"
[INFO] Not rendering RawInline (Format "html") "<br>"
[INFO] Not rendering RawInline (Format "html") "<br>"
[INFO] Not rendering RawInline (Format "html") "<small>"
[INFO] Not rendering RawInline (Format "html") "</small>"
[INFO] Not rendering RawInline (Format "html") "<!-- $colspan=\"2\" -->"
[INFO] Not rendering RawInline (Format "html") "<i class=\"fas fa-exclamation-triangle\">"
[INFO] Not rendering RawInline (Format "html") "</i>"
[INFO] Not rendering RawInline (Format "html") "<!-- $id=\"element_id\" class=\"some_class\" $style=\"color: #ad1457; margin-left: 40px;\" $disabled=\"true\" $title=\"a paragraph of text\" $data-color=\"red\" -->"
[INFO] Not rendering RawInline (Format "html") "<link rel=\"stylesheet\" href=\"https://use.fontawesome.com/releases/v5.7.2/css/all.css\">"
[INFO] Not rendering RawInline (Format "html") "<i class=\"fas fa-check\">"
[INFO] Not rendering RawInline (Format "html") "</i>"
[INFO] Not rendering RawInline (Format "html") "<i class=\"fas fa-question\">"
[INFO] Not rendering RawInline (Format "html") "</i>"
[INFO] Not rendering RawInline (Format "html") "<i class=\"fas fa-star\">"
[INFO] Not rendering RawInline (Format "html") "</i>"
[INFO] Not rendering RawInline (Format "html") "<i class=\"fas fa-bell\">"
[INFO] Not rendering RawInline (Format "html") "</i>"
[INFO] Not rendering RawInline (Format "html") "<i class=\"fas fa-times-circle\">"
[INFO] Not rendering RawInline (Format "html") "</i>"
[INFO] Not rendering RawInline (Format "html") "<i class=\"fas fa-ellipsis-h\">"
[INFO] Not rendering RawInline (Format "html") "</i>"
[INFO] Not rendering RawInline (Format "html") "<i class=\"fas fa-scroll fa-lg\">"
[INFO] Not rendering RawInline (Format "html") "</i>"
[INFO] Not rendering RawInline (Format "html") "<br>"
[INFO] Not rendering RawInline (Format "html") "<i class=\"fas fa-info-circle fa-lg\">"
[INFO] Not rendering RawInline (Format "html") "</i>"
[INFO] Not rendering RawInline (Format "html") "<br>"
[INFO] Not rendering RawInline (Format "html") "<i class=\"fas fa-ban fa-lg\">"
[INFO] Not rendering RawInline (Format "html") "</i>"
[INFO] Not rendering RawInline (Format "html") "<br>"
Build complete
## WARNING
webpage/v/local exists: replacing it with an empty directory
agitter commented 1 year ago

I'm testing the environment in Windows and couldn't install the current environment from f99263d683ec5ccc2c34c5b23a3fa3893e48d7c4. I received incompatible package conflicts until I unpinned the Python version and let conda choose Python 3.10.6. I also removed Weasyprint in case that was causing problems as it has in the past (#448).

I also got the warning from Word when opening a docx built in the current environment. That is due to the known problems with equations (#435). When I delete equations from the content, the warning disappears.

My document in the current environment looks like this: image manuscript.docx

In the new environment with librsvg it looks like this: image manuscript.docx

So librsvg works on Windows. I didn't attempt to investigate the problems with svg images referenced by URL. Even if we don't fix those in this pull request, this is an improvement. Do we want to explore that problem further or merge this?

dhimmel commented 1 year ago

I didn't attempt to investigate the problems with svg images referenced by URL. Even if we don't fix those in this pull request, this is an improvement. Do we want to explore that problem further or merge this?

I'm in favor of merging now. We probably should update all the conda versions including weasyprint if possible and this issue might resolve itself since its an upstream problem.

miltondp commented 1 year ago

I also agree with merging now. The URL issue with images should probably be addressed in a separate issue, and could probably be low priority since there is an easy workaround.