michal-h21 / make4ht

Build system for tex4ht
135 stars 15 forks source link

Testing report for ODT output #15

Open gusbrs opened 6 years ago

gusbrs commented 6 years ago

As you asked (or as was my misunderstanding of your request :), I did some testing for ODT output with make4ht. My approach here was to start from an actual working document of mine, with all the elements I usually employ, to reduce it to an actual smaller testing document which retained its complexity and elements. I’ve removed though nested tabular/makecell elements, for I wanted to test things with make4ht "vanilla".

Indeed, all testing was done with:

make4ht -f odt filename.tex

without any additional config or make files. And biber filename as appropriate, of course.

As for environment, tests were done with a full and up-to-date TeX Live 2018, with the current dev version of make4ht on a Linux Mint 18.3, also up-to-date.

The test files are available at: https://gist.github.com/gusbrs/36ea400945e7031096464a8f98e001b4 (Please download them and let me know when you’ve done so. As they were derived from a working document of mine, I don’t want to leave this publicly available.)

There are three files. The first one was built with the above intention in mind, and compiled and tested with pdflatex. Now, this file, as it is, is not really amenable to be built with make4ht. So I had to strip down some things to reach the second file which, as the first, is based on the scrartcl class. The third test file, in turn, is a version of the second one with the standard article class.

What had to be removed from the full document to get results with make4ht

With these changes, we have the second test file, which is compilable and produces reasonable (though improvable) output.

Log files (full piped terminal output) for both the second and third test files are available at: https://gist.github.com/gusbrs/f822630ffd09029871401fe54c3746a2

Comments on the second (scrartcl) ODT output

Comments on the third (article) ODT output

Here some things seem to work better:

But pretty much everything else stands on the same ground.

Comments on the third (article) resulting content.xml

Well, I hope this testing is useful. Thank you for the great work! And, as usual, I remain at your disposal for discussion and further testing.

michal-h21 commented 6 years ago

Thanks, that it is quite massive report :o

I will need some time to process it, some issues may be quite hard to fix.

michal-h21 commented 6 years ago

My first findings:

as a workaround

seems to work and fixes next issue:

I will try to fix these and other isues later.

michal-h21 commented 6 years ago

I've also found two entries in biblatex-example.bib which cause invalid XML - knuth:ct:related and knuth:ct:a. The ODT file can be opened after I removed them. This is definitely a bug in tex4ht.

gusbrs commented 6 years ago

@michal-h21 Nice to see things going that fast. Thank you very much! I'll be following attentively your comments here and, if need be, will comment back (So far, I have nothing to add to your observations). And, if you reach a point where you want me to test things again, just let me know.

michal-h21 commented 6 years ago

today I've fixed some issues in tex4ht sources, in quest to make the resulting ODT file valid in the ODF validator. I've removed some DTD definitions that didn't really work, there are still some validation issues with math, but I think I am on a good path.

One huge success is that Word can now open the ODT file and display math, which it didn't support up until now. The issue was only wrong mime type in the file directory. It is really good that it is no longer necessary to fix the ODT file in LibreOffice.

On the negative side, pandoc cannot convert the ODT file, even if it is perfectly valid, it reports only:

Couldn't parse odt file.

This needs further investigation.

Bad thing is that with every fix I find more bugs, so there is still lot of things to do.