metanorma / coradoc

Coradoc is the Core AsciiDoc Parser used by Metanorma
MIT License
1 stars 4 forks source link

ODT proposal #136

Open hmdne opened 1 month ago

hmdne commented 1 month ago

While working on #135, I have realized the idea is solid. This issue is to describe shortly what I plan to do; the milestones will need to change a little though.

The idea in short: for DOCX files support, I plan to implement an ODT parser and converter to Coradoc. This will not get rid of LibreOffice dependency (unless user generates ODT file himself). In my experience, ODT is very close to HTML, yet it preserves a lot more semantic than LibreOffice HTML, so this should be fairly easy to do (at least, compared to DOCX - I would describe the difference as follows: the ODT format was designed for document interchange, the DOCX format was designed to represent internal MS Word structures serialized to XML - and as @opoudjis noted, this isn't even well documented).

The plan is as follows:

Any opinions on that plan?

@ronaldtse @ReesePlews @opoudjis @webdev778 @xyz65535

ronaldtse commented 1 month ago

@hmdne I think this is doable, but I don't want to spend too much resources in doing this, given we have other priorities.

create a gem, that will map ODT format

Technically this means we create an ODT gem that can read (and possibly, write) ODT, using lutaml-model and rubyzip. This is reasonable and contained as a task (and allows contained testing).

the DOCX format was designed to represent internal MS Word structures serialized to XML

Nonetheless, the ultimate goal remains that we need to support DOCX format input. At this moment I would consider ODT a "easier of the two evils" -- an intermediary step between Coradoc and DOCX. I really think DOCX is within reach.

The current mechanism of html2doc (MHT) already prohibits people with Windows Word from directly loading files generated by Metanorma. Microsoft has removed MHT functionality from Windows Word, and therefore we must switch to generating DOCX in the future.

Resources

How long do you think this will take?