kjambunathan / org-mode-ox-odt

The Authoritative fork of Org mode's ODT exporter
GNU General Public License v3.0
45 stars 9 forks source link

Corrupted ODT output #278

Open jefftrull opened 6 days ago

jefftrull commented 6 days ago

When attempting to load ODT export into Microsoft Word for Mac (launched via C-x C-e o O) I get a dialog that says:

Word found unreadable content in testcase.odt. Do you want to recover the contents of this document?...

The output is testcase.odt. I'm running Emacs 29.3 with the latest version of this package (and I verified it was in use in the manner described in the README). The input is as follows:

#+TITLE: ODT export test case

#+OPTIONS: TOC:nil

#+REVEAL_EXTRA_CSS: ./modifications.css

* Heading 1
Considerations for the purchase of a rotary multitool:
- max speed
- torque
- variety and availability of accessories

There may be others, depending on your application.

* Heading 2
- Oscillating tools are also useful for various tasks
- Grout removal
- Sanding

* Brands
** Festool
- /Most importantly/ they sponsor Laura Kampf, as described in
[[https://www.festool.com/blog/work/interview-laura-kampf][a post on their site]] but perhaps you require more facts.
- Also they acquired a company I briefly worked for and retain
a positive impression of [[https://www.youtube.com/watch?v=ugU9iBCVxqQ][as you can see here]].
- High price conveys exclusivity, artisan usage?
** DeWalt
- I think this is a brand made up by Black and Decker when they were
regarded poorly by professionals. IDK.
- =Yellow= is that Fluke trademark infringement?
- I have one of their cordless drills and it's very nice.
** Milwaukee
- Red
- Blue-collar Rust Belt city connotes realness
* Heading 4
** Subheading 4.1
*** Rail Transit in the Bay Area
We have some strange regional rivalries here in Northern California.
The Bay Area has three major long-distance rail systems run by completely
separate organizations with loose coordination. They are Bart (tunnel and
viaduct, mostly East Bay), Caltrain (Peninsula and South Bay), and Amtrak
(inter-regional but largely serving the East Bay). Despite the pretensions
of San Francisco to be the leading city, Oakland is closer to a geographical
center and in fact San Jose may be, in a few years, the only place served
by all three plus the upstart Ace. Lowly Diridon Station may well snatch
the transportation crown from the brand new but ghost-like Transbay Transit
Center.

But let's forget about all that, and look at some code instead.
*** Example
#+begin_src c++
  int gcd(int a, int b) {
      while (b != 0) {
          int t = b;
          b = a % t;
          a = t;
      }
      return a;
  }
#+end_src

** Further Reading
Do yourself a favor and get a copy of [[https://elementsofprogramming.com/][Elements of Programming]] or
[[https://www.fm2gp.com/][From Mathematics to Generic Programming.]]

but it might blow your mind so /proceed cautiously./

* Let's make a table
** Sales
| Category     | Product A | Product B | Product C |
|--------------+-----------+-----------+-----------|
| /            |         < |         > |         < |
| Employees    |      2712 |       871 |        99 |
| Direct Mail  |       111 |      1897 |       422 |
| Black Friday |       388 |      2012 |       666 |
|--------------+-----------+-----------+-----------|
| totals       |      3211 |      4780 |      1187 |
#+TBLFM: @6$2..@6$4=vsum(@I+2..@II)
kjambunathan commented 5 days ago

When attempting to load ODT export into Microsoft Word for Mac (launched via C-x C-e o O) I get a dialog that says:

Word found unreadable content in testcase.odt. Do you want to recover the contents of this document?...

I have looked at this report ...

Is the ODT file openable by LibreOffice. (I don't have access to Microsoft Word so I won't be able to independently confirm the behaviour at my end)

I'm running Emacs 29.3 with the latest version of this package (and I verified it was in use in the manner described in the README). The input is as follows:

The org snippet you copy-pasted in the issue was not surrounded in triple backticks. So, I have surrounded this in triple backticks. (If you edit the issue you filed, you can follow what I mean.) Please confirm if the snippet I see is faithful reproduction of your org file.

Let me see if I can reproduce the issue on my end.

kjambunathan commented 5 days ago

The output is testcase.odt.

testcase.odt opens fine with LibreOffice

Version: 24.2.6.2 (X86_64) / LibreOffice Community
Build ID: 420(Build:2)
CPU threads: 4; OS: Linux 6.10; UI render: default; VCL: gtk3
Locale: en-IN (en_IN); UI: en-US
Debian package version: 4:24.2.6-2
Calc: threaded

testcase odt-no-issues-with-LO-24 2 6 2 (Debian)-2024-10-19_18-09

kjambunathan commented 5 days ago

When attempting to load ODT export into Microsoft Word for Mac (launched via C-x C-e o O)

You mean C-c C-e o O right ?

I did the same thing on your .org snippet. It exports fine and opens fine on LibreOffice.

Btw, the way you have shared your org snippet tells me that you are new to org.

2024-10-19_18-20

If you want to export to docx, you can configure the org-odt-preferred-output-format to docx.

The org snippet you shared is fairly straight forward, and so I believe the docx you create will be open fine on MS Word (Mac).

org-odt-preferred-output-format is a variable defined in `ox-odt.el'.

Its value is nil

Automatically post-process to this format after exporting to "odt".
Command `org-odt-export-to-odt' exports first to "odt" format
and then uses `org-odt-convert-process' to convert the
resulting document to this format.  During customization of this
variable, the list of valid values are populated based on
`org-odt-convert-capabilities'.

You can set this option on per-file basis using file local
values.  See Info node `(emacs) File Variables'.

  This variable is safe as a file local variable if its value
  satisfies the predicate `stringp'.
  This variable was introduced, or its default value was changed, in
  version 24.1 of Emacs.
  You can customize this variable.

If you share more details, I will be able to find the root cause of your issue and address it.

When I pass the testcase.odt through the https://odfvalidator.org/, I see the following:

testcase odt+odfvalidator-2024-10-19_18-32

jefftrull commented 5 days ago

Thanks for the quick response!

I agree, the validator likes the ODT output. I'm not sure why MS Word does not. At the moment my plan is to unpack the output when Word "repaired" it, and compare vs. the original. Perhaps it is being needlessly strict, or making some assumption that is always true in its own ODT output.

It was not my intention to generate a .docx file. Is it mandatory to use org-odt-preferred-output-format?

jefftrull commented 5 days ago

testcase

It does look wrong when rendered by MS Word. That doesn't mean it's the exporter's fault of course :)

kjambunathan commented 5 days ago

I'm not sure why MS Word does not.

I don't use MS Word. But @QiangF uses MS Word, and he had never faced issues with odt files generated by my exporter. Your org file is fairly straightforward (by straightforward, I mean, it has no images or sequence numbers for table etc), and I would expect that the MS Word on your Mac (as long as it is NOT VERY VERY OLD) should be able to open it.

May be @QiangF can provide some inputs on the MS Word front.

It was not my intention to generate a .docx file. Is it mandatory to use org-odt-preferred-output-format?

If Microsoft dislikes ODT file, then it should like atleast doc or docx file. So, my suggestion here was to tell you that you can use LO to generate a file that MS Word has no problems with.

LO is available on Mac, and LO is the one that does odt to docx conversion.


One thing you can try out ... Keep commenting out paragraphs, tables, source code etc one at a time, and try to see what content in your org file creates the problem.

kjambunathan commented 5 days ago

It does look wrong when rendered by MS Word. That doesn't mean it's the exporter's fault of course :)

Try exporting with num:nil (that is,with unnumbered headlines)

At the top of the org buffer, do C-c C-e # d and in the resulting exporter options, flip the num:t to num:nil as below and export.

#+options: inline:t num:t p:nil pri:nil prop:nil stat:t tags:t
kjambunathan commented 5 days ago

How do these docx files look on your side?

docx file created by ODT exporter + LO with numbered headings.

issue-278.docx

docx file created by ODT exporter + LO with NO numbered headings.

issue-278-no-headline-numbers.docx

jefftrull commented 5 days ago

Going to try turning off numbers in a second but I wanted to pass on what I learned by looking at the content.xml where the "Example" heading is defined. When Word "repairs" the file it writes it back out as ODT 1.4 like this:

<text:h text:style-name="Heading3" text:outline-level="3">
    <text:bookmark-start text:name="org6ae8676" />Example
    <text:bookmark-end text:name="org6ae8676" />
</text:h>

and the style is:

<style:style style:name="Heading3" style:display-name="Heading 3" style:family="paragraph" style:parent-style-name="Heading" style:next-style-name="Textbody" style:default-outline-level="3">
    <style:text-properties fo:font-weight="bold" style:font-weight-asian="bold" style:font-weight-complex="bold" fo:hyphenate="false" />
</style:style>

By comparison in the exporter's output we have:

<text:h text:style-name="Heading_20_3" text:outline-level="3">
    <text:bookmark-start text:name="orga0c5199" />Example
    <text:bookmark-end text:name="orga0c5199" />
</text:h>

with style:

<style:style style:name="Heading_20_3" style:display-name="Heading 3" style:family="paragraph" style:parent-style-name="Heading" style:next-style-name="Text_20_body" style:default-outline-level="3" style:class="text">
    <style:text-properties fo:font-size="14pt" fo:font-weight="bold" style:font-size-asian="14pt" style:font-weight-asian="bold" style:font-size-complex="14pt" style:font-weight-complex="bold"/>
</style:style>

I'm no expert on this format but I don't see why either would produce a heading shifted so only the first letter appears on the line. It's strange.

jefftrull commented 5 days ago

Using the options you suggested (via C-c C-e # d) numbered headings are removed for level 3 (and not 1 or 2) and the text looks normal (not severely right justified). However Word still warns that part of the document is "unreadable" and wants to repair it.

jefftrull commented 5 days ago

Word is happy with both of your example docx files.

jefftrull commented 5 days ago

I followed your suggestion to remove different parts of the input, and found that the final table seems to be what makes Word unhappy. Even the simplest possible table (one column, no hlines, no formulas) causes Word to complain.

jefftrull commented 5 days ago

I think I've found the problem. If you look at the embedded content.xml you can find this line:

<table:table table:style-name="nil">

If you change nil to OrgTable Word loads the file without complaints.

kjambunathan commented 3 days ago

I think I've found the problem. If you look at the embedded content.xml you can find this line:

<table:table table:style-name="nil">

If you change nil to OrgTable Word loads the file without complaints.

Thanks for isolating the root cause and suggesting a fix.

Your suggestion is now part of https://github.com/kjambunathan/org-mode-ox-odt/commit/45c6b4fa0a569baf09984550d2a1447cb27d5a49

kjambunathan commented 3 days ago

Word is happy with both of your example docx files.

It was created with org-odt-convert command

The automatic conversion can be done with org-odt-preferred-output-format. Since odt to docx conversion is done by LO, LO has to be installed. On Debian the LO's command line is called soffice . See org-odt-soffice-executable.

jefftrull commented 3 days ago

Thanks so much for the fix!

Any thoughts on the strange formatting with num:t? Should I file a separate ticket?

kjambunathan commented 3 days ago

Thanks so much for the fix!

Any thoughts on the strange formatting with num:t? Should I file a separate ticket?

I believe the MS Word did only a best effort attempt at "repairing" and I hope it did a "half-baked" repairt.

Please pull again, and see if any new corruption is reported. If you find any problems, including the headline problem that you noted, I will be happy to fix it.

Please keep the org file as small as possible. That is, file a bug report with MWE (=Minimal Working Example) and remember to triple backtick your org snippet

jefftrull commented 3 days ago

Regrettably the formatting issue persists though Word no longer attempts to "repair" the file. I will file a separate ticket with an M(n)WE. Thank you.

kjambunathan commented 3 days ago

Regrettably the formatting issue persists though Word no longer attempts to "repair" the file. I will file a separate ticket with an M(n)WE. Thank you.

I can anticipate what the problem could be. It is the SPC character in style name _20_, which is 0x20 (=32 decimal).

It is just a question of changing the style:style style:name="Heading_20_1" etc in https://github.com/kjambunathan/org-mode-ox-odt/blob/45c6b4fa0a569baf09984550d2a1447cb27d5a49/etc/styles/OrgOdtStyles.xml#L111

and fixing the 20-s in ox-odt.el.

lines from buffer: ox-odt.el
    469:         ((text:style-name . ,(format "%s_20_Index_20_Heading" style-prefix)))
    486:                  ((text:style-name . ,(format "%s_20_Index_20_Heading" style-prefix)))
   3669:     ((text:style-name . "Contents_20_Heading"))
   3692:               ((text:style-name . ,(format "Heading_20_%d" depth))))))))
   3698:              ((text:style-name . "Contents_20_Heading"))
   5884:                    (concat "Heading_20_" (number-to-string level) suffix))
   5888:                    (concat "Heading_20_" (number-to-string level) suffix-i))
   5889:                   (t (concat "Heading_20_" (number-to-string level)))))))
   6203:       ((text:style-name . "Index_20_Heading"))
   6224:            ((text:style-name . "Index_20_Heading"))
   6244:     ((text:style-name . "User_20_Index_20_Heading"))
   6271:              ((text:style-name . "User_20_Index_20_Heading"))

I believe you are a programmer. So, just unzip the odt file, and FIX the content.xml and styles.xml to whatever form is desired.

Another way would be to do a 2-way conversion from odt->docx on MS Word side, and import the resulting docx file to LO as odt, and see what LO reports. See if 2-way conversion is faithful or lossy.

This is a bit labourious, but since you seem comfortable doing some grunt work, you can experiment on your side, and suggest a possible line of attack.

kjambunathan commented 3 days ago

May be it is better to do LO do the conversion from 'odt' to 'docx' and you guve the resulting docx to MS word.

That is what I did while creating docx files I uploaded. Why do you hesitate to install LO on Mac.

kjambunathan commented 3 days ago

How do these docx files look on your side?

docx file created by ODT exporter + LO with numbered headings.

issue-278.docx

docx file created by ODT exporter + LO with NO numbered headings.

issue-278-no-headline-numbers.docx

Word is happy with both of your example docx files.

I can anticipate what the problem could be. It is the SPC character in style name 20, which is 0x20 (=32 decimal).

It is just a question of changing the style:style style:name="Heading_20_1" etc in

Apparently, how the thing is named seems to matter here.

LO is mindful of what MS Word expects, and does the right thing by mangling/ un-SPC-ing the style name.

May be you should NOT hesitate to install LO on Mac, and set the org-odt-preferred-output-format to doc or docx or even pdf.

I am ambivalent about if un-SPCing style names in ox-odt.el and styles.xml file (for MS Word's sake) is worth the effort, considering that LO does the right thing.

document.xml of LO generated docx file

    <w:p>
      <w:pPr>
    <w:pStyle w:val="Heading3" />
    <w:bidi w:val="0" />
    <w:ind w:hanging="0"
           w:left="0" />
    <w:jc w:val="left" />
    <w:rPr></w:rPr>
      </w:pPr>
      <w:bookmarkStart w:id="9"
                       w:name="orgd399794" />
      <w:r>
    <w:rPr></w:rPr>
    <w:t>Example</w:t>
      </w:r>
      <w:bookmarkEnd w:id="9" />
    </w:p>

styles.xml of LO generated docx file

  <w:style w:styleId="Heading3"
           w:type="paragraph">
    <w:name w:val="Heading 3" />
    <w:basedOn w:val="Heading" />
    <w:next w:val="BodyText" />
    <w:qFormat />
    <w:pPr>
      <w:numPr>
    <w:ilvl w:val="2" />
    <w:numId w:val="1" />
      </w:numPr>
      <w:outlineLvl w:val="2" />
    </w:pPr>
    <w:rPr>
      <w:b />
      <w:bCs />
      <w:sz w:val="28" />
      <w:szCs w:val="28" />
    </w:rPr>
  </w:style>

document xml+styles xml

jefftrull commented 3 days ago

I didn't really understand your point about the space character. Is there something that is interpreting _20_ as a space? I don't see any reference to this in the ODT spec. Anyway, I tried replacing every instance of _20_ with a alphanumeric string I picked, and it made no difference in the results.

It looks to me like this is something particular to the 3rd headline level. The 2nd and 4th levels do not have this issue. Also, the text is positioned at the right tab stop (17cm). I can't tell why yet.

kjambunathan commented 3 days ago

I made a suggestion to install LO and customize org-odt-preferred-output-format. You seem to ignore this suggestion. It either tells me that you have missed my suggestion, or have a goal that is in no way concerned with producing documents but something else. I need to have a clear understanding of this "something else", so that I can help you better.

I didn't really understand your point about the space character. Is there something that is interpreting _20_ as a space? I don't see any reference to this in the ODT spec.

Since you are talking in terms of spec, this explanation should suffice.

ncname

Also, the text is positioned at the right tab stop (17cm). I can't tell why yet.

The only image you shared is the one below. A4 is 21cm x 29.7 cm. Obviously, the heading text is NOT starting at 17cm. If it were, it would be more pushed to the right, than what the image suggests.

I want to ensure that we are talking about the same thing. As I said, I have no access to MS Word, and docx is a collection of XML files. Emacs opens docx in doc-view-mode (that is, it will convert docx to pdf and then display that pdf, provided you have the required docx to pdf converters) or you can press C-c C-c while in doc-view-mode so that Emacs displays the text version of the docx file. It just a "zip" (or some such) archive file.

testcase

kjambunathan commented 3 days ago

l. The 2nd and 4th levels do not have this issue. Also, the text is positioned at the right tab stop (17cm).

I can't talk for docx, but on the LO and odt side I would start with LO Menu -> Headline Numbering -> Position

outline-numbering

kjambunathan commented 3 days ago

Also, the text is positioned at the right tab stop (17cm).

This is one explanation of the 17cm ....

explanation-of-tab-stop

jefftrull commented 3 days ago

That's right, it's positioned at 17cm, which is the right tab stop. Both the "R" of 4.1.1 and the "E" of 4.1.2 are there. If I change the tab-stop inside styles.xml they move accordingly. The headings at levels 1 and 2, and again at 4 (in my experiments) do not do this. It seems significant to me that it is only heading level 3, and so I have been focusing my attention there first.

I appreciate your encouragement to find a workaround for the issue by using LO, and I will certainly take it up next, but I was hoping to understand first whether:

  1. The ODT output is incorrect, or
  2. Word is improperly rendering its input

and I suppose we can break item 1 down into:

I hope to use your exporter a lot with my new client, who expects reports in Word/Excel/Powerpoint formats. For me it's worth taking the time to understand the process at a deeper level. So I'm trying to take a little extra time on the first step of generating ODT and ensuring it's correct, with this first document. I hope that's OK.

Thank you for making the exporter and for all the effort you are putting into analyzing this test case.

kjambunathan commented 2 days ago

That's right, it's positioned at 17cm, which is the right tab stop. Both the "R" of 4.1.1 and the "E" of 4.1.2 are there. If I change the tab-stop inside styles.xml they move accordingly.

Which tab-stop in styles.xml? styles.xml is a big file.

styles.xml is essentially copy of ./org-mode-ox-odt/etc/styles/OrgOdtStyles.xml and styles OrgOutline and Heading_20_3 have NO tab stop setting.

Please upload the org snippet and testcase.odt AFTER the corruption fix.

I hope you don't have stray TAB characters in your org file.

For me it's worth taking the time to understand the process at a deeper level.

Install LO and inspect what the UI says about tab stops.

For me it's worth taking the time to understand the process at a deeper level. So I'm trying to take a little extra time on the first step of generating ODT and ensuring it's correct, with this first document. I hope that's OK.

Now, I understand where you are coming from.

The ODT exporter is 14 years old, and has been in the Emacs releases for this long period. It has been eyeballed for so long and I doubt if it is trivially broken.

kjambunathan commented 2 days ago

One another thought ...

Are you using custom styles file (one that is different from what the ODT exporter ships with)?

You are new to org (I can infer that from our talk) and so I won't be surprised if you have borrowed any of the custom "starter kits" and they have done some weird customizations of styles.xml.

kjambunathan commented 2 days ago

If there are issues that you find on MS Word side, you need to interpret it in terms what the ODT exporter does wrong.

As I said, I don't have access to MS Word. (I am Debian user, and I have no access to Windows machines). Leaving aside this constraint on my side, I will be happy to address the questions and concerns you have, and continue having conversation. Lead me and I will follow along.

Here are a good places to start ....

jefftrull commented 2 days ago

Thanks for all the suggestions and the helpful links. I will take your questions in order:

Which tab-stop in styles.xml? styles.xml is a big file.

I am referring to the one associated with the Heading style. It's through this that each heading style (such as Heading_20_3) acquires this right tab stop, I believe.

I hope you don't have stray TAB characters in your org file.

Good question - I don't seem to.

Install LO and inspect what the UI says about tab stops.

The LO UI agrees with Word that each heading style has a right tab stop at 17cm. Unlike Word, LO does not place the heading 3 text there. I'm not sure which is correct.

Are you using custom styles file (one that is different from what the ODT exporter ships with)?

I am using vanilla Emacs with a fairly lightweight config of my own. I am new to the ODT exporter and have not customized it in any way yet.

jefftrull commented 2 days ago

Here is my new reduced testcase generated after the corruption fix.

#+TITLE: ODT export test case (formatting)

#+options: ':nil *:t -:t ::t <:t H:3 \n:nil ^:t arch:headline
#+options: author:nil broken-links:nil c:nil creator:nil
#+options: d:(not "LOGBOOK") date:nil e:t email:nil f:t inline:t num:t
#+options: p:nil pri:nil prop:nil stat:t tags:t tasks:t tex:t
#+options: timestamp:t title:t toc:t todo:t |:t
#+date: <2024-10-20 Sun>
#+author: Jeff Trull
#+email: jefft@Mayukh-Mac.local
#+language: en
#+select_tags: export
#+exclude_tags: noexport
#+creator: Emacs 29.4 (Org mode 9.6.15)
#+cite_export:
#+OPTIONS: TOC:nil 

* Heading 1
** Subheading 1.1
*** Rail Transit in the Bay Area
We have some strange regional rivalries here in Northern California.

testcase2.odt This is how it looks in MS Word. The R of "Rail" is exactly at 17cm: image001

kjambunathan commented 2 days ago

Just for reference, can you update https://github.com/kjambunathan/org-mode-ox-odt/issues/278#issuecomment-2428300662 with 2 more screenshots. (Caption the screenshots so that we can identify what is what)

Essentially the above comment needs two more screenshots.

kjambunathan commented 2 days ago

I am referring to the one associated with the Heading style. It's through this that each heading style (such as Heading_20_3) acquires this right tab stop, I believe.

The tab setting takes effect only if there is a LITERAL tab character in the heading text. You have no tab character in heading text.

The outline numbers are NOT part of headline text, it is generated by the software (ODT exporter, LO, MS Word) and the relevant config is as mentioned in https://github.com/kjambunathan/org-mode-ox-odt/issues/278#issuecomment-2426218992. You can confirm that there is a Tab Stop at config in there.

So, technically I am inclined to focus on the Outline Numbering / Headline Numbering setting of LO (and MS Word).

But your experiment suggests that getting rid of tab-stop setting in Heading gives you the behviour you desire.

Export the below org snippet and load the resulting odt file in MS Word and report what you see. Note that <style:tab-stops /> config in below config is empty.

#+TITLE: ODT export test case (formatting)

#+options: ':nil *:t -:t ::t <:t H:3 \n:nil ^:t arch:headline
#+options: author:nil broken-links:nil c:nil creator:nil
#+options: d:(not "LOGBOOK") date:nil e:t email:nil f:t inline:t num:t
#+options: p:nil pri:nil prop:nil stat:t tags:t tasks:t tex:t
#+options: timestamp:t title:t toc:t todo:t |:t
#+date: <2024-10-20 Sun>
#+author: Jeff Trull
#+email: jefft@Mayukh-Mac.local
#+language: en
#+select_tags: export
#+exclude_tags: noexport
#+creator: Emacs 29.4 (Org mode 9.6.15)
#+cite_export:
#+OPTIONS: TOC:nil 

* Heading 1
** Subheading 1.1
*** Rail Transit in the Bay Area

We have some strange regional rivalries here in Northern California.

#+ATTR_ODT: :target "extra_styles" :backends (odt)
#+begin_src nxml
<style:style style:name="Heading"
             style:parent-style-name="Standard"
             style:next-style-name="Text_20_body"
             style:family="paragraph"
             style:class="text">
  <style:paragraph-properties fo:keep-with-next="always"
                              fo:margin-bottom="0.212cm"
                              fo:margin-top="0.423cm"
                              style:contextual-spacing="false">
    <style:tab-stops />
  </style:paragraph-properties>
  <style:text-properties style:font-name="Arial"
             style:font-name-asian="SimSun"
             style:font-name-complex="Tahoma1"
             fo:font-family="Arial"
             fo:font-size="14pt"
             style:font-family-asian="SimSun"
             style:font-family-complex="Tahoma"
             style:font-family-generic="swiss"
             style:font-family-generic-asian="system"
             style:font-family-generic-complex="system"
             style:font-pitch="variable"
             style:font-pitch-asian="variable"
             style:font-pitch-complex="variable"
             style:font-size-asian="14pt"
             style:font-size-complex="14pt" />
</style:style>
#+end_src
kjambunathan commented 2 days ago

+creator: Emacs 29.4 (Org mode 9.6.15)

I run org-mode-ox-odt from this repot, and in my case I get

#+creator: Emacs 31.0.50 (Org mode 9.7.7)

So, Org mode 9.6.15 in config line suggests that you are not definitely using the HEAD of this repo, and using ox-odt.el from somewhere else.

So, ox-odt.el could be from stock Emacs or stock org-mode or from an old version of this git repo.

You are saying that you have applied the corruption fix, in that case I would expect a different version of org in the config line.

IOW, I am noticing inconsistencies in your claims, and before we dig deep please confirm that ox-odt.el is from HEAD of this repo.

The good way to find out is do M-x find-library ox-odt and M-x find-library ox-ods and read the copyright notice in those files. You will see the declaration that this

This file is NOT part of GNU Emacs

IOW, I, Jambunathan K, will be the copyright holder and NOT FSF.

You can also see the *Messages* buffer, and if it reports the styles.xml file it is using, the ox-odt.el is definitely from this repo. (cf. README of this repo)

jefftrull commented 1 day ago

Yes, I'll be happy to tell you what I have here, to the best of my knowledge:

I installed the package according to the directions in the README and checked to see if if the Messages buffer had something consistent. What I found looked something like this:

ox-odt: Styles file is /Users/jefft/.emacs.d/elpa/ox-odt-9.6.620/etc/styles/OrgOdtStyles.xml

which seemed to match what you were looking for, so I concluded I had installed it correctly.

After you fixed the table-style "nil" issue I waited for a while to see an updated version of ox-odt but my package manager did not show one, so I assumed there was some delay for CI or something like that, so I went ahead and made the change in my local Emacs by editing the function and re-evaluating the defun with C-M-x. So I was able to verify that fix.

When I do find-library on ox-odt the el file has this at the top:

;; Copyright (C) 2010-2022 Jambunathan K <kjambunathan at gmail dot com>

;; Author: Jambunathan K <kjambunathan at gmail dot com>
;; Maintainer: Jambunathan K <kjambunathan at gmail dot com>
;; Keywords: outlines, hypermedia, calendar, wp
;; Homepage: https://github.com/kjambunathan/org-mode-ox-odt

;; Package-Requires: ((org "9.2.1"))

;; This file is NOT part of GNU Emacs.
jefftrull commented 1 day ago

The tab setting takes effect only if there is a LITERAL tab character in the heading text. You have no tab character in heading text.

I agree, I have no tab characters, and it certainly should not be putting text at that tab stop. But it is, and I know it's the tab stop and not some other 17cm point, because when I change the tab-stop associated with the Heading in styles.xml the text moves around. I have attached examples at 14cm and 11cm: tab_at_14cm tab_at_11cm

The only difference between these examples is I changed 17cm to 14 and then to 11 in this part of styles.xml:

<style:style style:name="Heading" style:family="paragraph" style:parent-style-name="Standard" style:next-style-name="Text_20_body" style:class="text"><style:paragraph-properties fo:margin-top="0.423cm" fo:margin-bottom="0.212cm" fo:keep-with-next="always"><style:tab-stops ><style:tab-stop style:position="17cm" style:type="right"/></style:tab-stops></style:paragraph-properties><style:text-properties style:font-name="Arial" fo:font-size="14pt" style:font-name-asian="SimSun" style:font-size-asian="14pt" style:font-name-complex="Tahoma" style:font-size-complex="14pt"/></style:style>

This is why I say the heading text at level 3 is "at the tab stop".

jefftrull commented 1 day ago

Export the below org snippet and load the resulting odt file in MS Word and report what you see. Note that config in below config is empty

It looks a bit strange still, with an unexpected gap to the left of the text:

kjn_request

kjambunathan commented 1 day ago

You are using the right ODT. (Patching the change locally is what I would have done, so that is OK too)

We are making progress ...

MS Word is NOT honoring OrgOutline style seen at

https://github.com/kjambunathan/org-mode-ox-odt/blob/40d7b39a99a84c67840f9cd3ac1ad5aa477c9cda/etc/styles/OrgOdtStyles.xml#L35

There is no Tab stop following the outline number in LO config.

I can't talk for docx, but on the LO and odt side I would start with LO Menu -> Headline Numbering -> Position

outline-numbering

MS Word is putting a tab stop after the outline number numeral. If you reduce the size of the numeral field 1.1.1 (=three levels starting from outline level 1) to 1.1 (= just the second level and third level) or just 1 (=just the third level), you will see the spacing doesn't happen. My theory is that the extra numeral pushes the text past the tab-stop (I hope you get the idea)

tab_at_14cm tab_at_11cm kjn_request

You aren't stating if the screenshots are from MS Word, and what the formats of the rendered files are. (I believe these outputs are LO-created ODT loaded on MS Word.)

Why are you hesitating to provided feedback on LO?

I need feedback on LO-created docx loaded on MS Word. You can use C-c C-e # o to insert odt specific export options and use docx for preferred output format.

#+odt_preferred_output_format: docx
#+odt_styles_file:
#+odt_extra_images:
#+odt_extra_styles:
#+odt_extra_automatic_styles:
#+odt_master_styles:
#+odt_content_template_file:
#+odt_automatic_styles:
jefftrull commented 1 day ago

You aren't stating if the screenshots are from MS Word, and what the formats of the rendered files are. (I believe these outputs are LO-created ODT loaded on MS Word.)

Why are you hesitating to provided feedback on LO?

These are all from MS Word. This is a work laptop and installing things is a nuisance. So I answered your other questions first. On my own Linux system LO displays the ODT output as I would expect - without mysterious spaces of any kind. When I get an opportunity I will install LO and answer your other question/upload screenshots.

jefftrull commented 1 day ago

OK here's the full four images you requested. LO versions always look good. The docx produced by LO conversion also looks good when viewed in MS Word. Only the original ODT looks bad in MS Word.

Original ODT viewed in MS Word: odt_msword

Original ODT viewed in LO: odt_libreoffice

Converted DOCX viewed in MS Word: docx_msword

Converted DOCX viewed in LO: docx_libreoffice

kjambunathan commented 17 hours ago

testcase2.zip

In the above zip file, tell me which of the odt files works well for you on MS word.

FWIW,


Here is how testcase2.org looks. The elisp snippet there modifies the stock OrgOdtStyles.xml as listed above. You can diff between the OrgOdtStyles-<i>.xml and OrgOdtStyles-<i+1>.xml to see what has been edited out or edited in.

#+TITLE: ODT export test case (formatting)

#+options: ':nil *:t -:t ::t <:t H:3 \n:nil ^:t arch:headline
#+options: author:nil broken-links:nil c:nil creator:nil
#+options: d:(not "LOGBOOK") date:nil e:t email:nil f:t inline:t num:t
#+options: p:nil pri:nil prop:nil stat:t tags:t tasks:t tex:t
#+options: timestamp:t title:t toc:t todo:t |:t
#+date: <2024-10-20 Sun>
#+author: Jeff Trull
#+email: jefft@Mayukh-Mac.local
#+language: en
#+select_tags: export
#+exclude_tags: noexport
#+creator: Emacs 29.4 (Org mode 9.6.15)
#+cite_export:
#+OPTIONS: TOC:nil

#+odt_preferred_output_format: docx

# #+odt_preferred_output_format: pdf

# #+export_file_name: testcase2-with-OrgOdtStyles.odt
# #+odt_styles_file: OrgOdtStyles.xml

# #+export_file_name: testcase2-with-OrgOdtStyles-1.odt
# #+odt_styles_file: OrgOdtStyles-1.xml

# #+export_file_name: testcase2-with-OrgOdtStyles-2.odt
# #+odt_styles_file: OrgOdtStyles-2.xml

# #+export_file_name: testcase2-with-OrgOdtStyles-3.odt
# #+odt_styles_file: OrgOdtStyles-3.xml

#+export_file_name: testcase2-with-OrgOdtStyles-4.odt
#+odt_styles_file: OrgOdtStyles-4.xml

#+begin_src emacs-lisp :exports none
(let* ((in-file ;; "/home/kjambunathan/src/emacs-extras/org-mode-ox-odt/etc/styles/OrgOdtStyles.xml"
    (expand-file-name "OrgOdtStyles.xml"))
       (dom (odt-dom:file->dom
         in-file
         'strip-comment-nodes-p))
       (fs (list
        (lambda (dom)
          "Stip comments"
          dom)
        (lambda (dom)
          "Rename `OrgOutline' style to `Outline'."
          (prog1 dom
        (thread-last dom
                 (odt-dom-map (lambda (it)
                        (when (and (eq (odt-dom-type it)
                               'text:outline-style)
                               (string= (odt-dom-property it 'style:name) "OrgOutline"))
                          (map-put (odt-dom-properties it) 'style:name "Outline")))))))
        (lambda (dom)
          "Explicitly associate Heading styles with `Outline' numbering style"
          (prog1 dom
        (thread-last dom
                 (odt-dom-map (lambda (it)
                        (when (and (eq (odt-dom-type it)
                               'style:style)
                               (string-match-p (rx (and bos
                                        "Heading" "_20_"
                                        (one-or-more digit))
                                       eos)
                                       (odt-dom-property it 'style:name)))
                          (setcar (cdr it)
                              (map-merge 'alist (odt-dom-properties it)
                                 '((style:list-style-name . "Outline"))))
                          it))))))
        (lambda (dom)
          "Re-define `Outline' style to use explicit tab stops"
          (prog1 dom
        (thread-last dom
                 (let* ((node (car (odt-dom-map (lambda (it)
                                  (when (and (eq (odt-dom-type it)
                                         'text:outline-style)
                                     (string= (odt-dom-property it 'style:name) "Outline"))
                                it))
                                dom)))
                    (parent (dom-parent dom node))
                    (replacement-node `(text:outline-style ((style:name . "Outline"))
                                       ,(thread-last 10
                                             (number-sequence 1)
                                             (seq-map
                                              (lambda (it)
                                                `(text:outline-level-style
                                                  ((text:level . ,(format "%s" it))
                                                   (style:num-format . "1")
                                                   (text:display-levels . ,(format "%s" it)))
                                                  (style:list-level-properties
                                                   ((text:list-level-position-and-space-mode
                                                 . "label-alignment"))
                                                   (style:list-level-label-alignment
                                                ((text:label-followed-by . "listtab")
                                                 (text:list-tab-stop-position . ,(format "%.2fcm" (* it 0.5)))))))))))))
                   (dom-remove-node dom node)
                   (dom-append-child parent replacement-node))))))))
  (seq-mapn
   (lambda (f i)
     (message "Applying %s" i)
     (funcall f dom)
     (let* ((out-file (format "%s-%d.xml"
                  (file-name-sans-extension in-file)
                  i)))
       (thread-last dom
            (odt-stylesdom:dom->file out-file
                         ;; (not 'prettify)
                         'prettify))
       (with-temp-buffer
     (insert-file-contents out-file)
     (setq-local backup-inhibited t)
     ;; Write styles.xml
     (let ((coding-system-for-write 'utf-8))
       (write-region nil nil out-file)))))
   fs (number-sequence 1 (length fs))))
#+end_src

* Heading 1

** Subheading 1.1

*** Rail Transit in the Bay Area

We have some strange regional rivalries here in Northern California.