jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.69k stars 3.39k forks source link

RTF writer doesn't use RTF lists #2631

Open derelk opened 8 years ago

derelk commented 8 years ago

When converting a known list, e.g. from Markdown, I would have expected the RTF output to use RTF's native list formating (\list, \listtext etc.), but instead it's manually formatting it with \bullet and \tab. This prevents it from being recognized as a list in RTF editors such as LibreOffice, Gmail, and OS X's TextEdit.

jgm commented 8 years ago

I wrote that writer a decade ago. I don't know if I even knew about \list, \listtext and so on. There may have been a reason why I didn't use them, but using them sounds like a very good idea.

Could you give some complete examples of RTF lists using these commands, including nested lists, lists with multiple paragraphs under a list item, and ordered lists with different numbering styles? (We'd have to support all of that.)

derelk commented 8 years ago

To be honest what I wrote above is essentially the extent of what I know—I don't know anything about RTF. I just came across this problem as I was trying to convert from Markdown into RTF that was paste-able into Gmail.

Jmuccigr commented 8 years ago

This seems to be the relevant section in the spec: http://www.biblioscape.com/rtf15_spec.htm#Heading33

Testing, it looks like TextEdit and Bean both create list items that look like this:

{\listtext 1. }One\

where the 1. is the marker. Other list types also have the marker in that location. The app just changes the literal marker to match the selected list type. No idea whether this is good practice. (BTW, the marker for a bullet is \'95 which is hex for 149, the ascii code for bullet.)

A nested list looks like this, where the second list item Two has two sub items:

{\listtext  \'95    }One\
{\listtext  \'95    }Two\
\pard\tx940\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\li1440\fi-1440\pardirnatural
\ls2\ilvl1\cf0 {\listtext   
\f1 \uc0\u8259 
\f0     }Two a\
{\listtext  
\f1 \uc0\u8259 
\f0     }Two b\
\pard\tx220\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\li720\fi-720\pardirnatural
\ls2\ilvl0\cf0 {\listtext   \'95    }Three\

Couldn't figure out how to get either app to do multi-paragraph list item.

geekscrapy commented 3 years ago

FWIW This issue is still valid

mb21 commented 3 years ago

I took a quick look at the current listItemToRTF function and there are some "interesting" things going on...

  let listMarker = "\\fi" <> tshow (negate listIncrement) <> " " <> marker <>
                   "\\tx" <> tshow listIncrement <> "\\tab"
  -- Find the first occurrence of \\fi or \\fi-, then replace it and the following
  -- digits with the list marker.
  let insertListMarker t = ...

with the insertListMarker being applied only to the first Block (usually paragraph) in the list item.


Anyway, taking a closer look at what we should actually generate, I copy-pasted the following from this StackOverflow answer into test.rtf:

{\rtf1

{\f2 {\pntext \'B7\tab}{*\pn\pnlvlblt\pnstart1{\pntxtb\'B7}}{\ltrch This is a test.}\li720\ri0\sa0\sb0\jclisttab\tx720\fi-360\ql\par}
{\f2 {\pntext \'B7\tab}{*\pn\pnlvlblt\pnstart1{\pntxtb\'B7}}{\ltrch So is this.}\li720\ri0\sa0\sb0\jclisttab\tx720\fi-360\ql\par}

}

But when opened in macOS's TextEdit, this isn't recognized as a proper bullet list, but falls back on plain asterisks:

*This is a test.
*So is this.

Also from here:

Word 97 stores bullets and numbering information very differently from earlier versions of Word. In Word 6.0, for example, number formatting data is stored individually with each paragraph. In Word 97, however, all of the formatting information is stored in a pair of document-wide list tables which act as a style sheet, and each individual paragraph stores only an index to one of the tables, like a style index.

Which is what macOS' TextEdit seems to do as well, e.g. it generated the following:

{\rtf1
{\*\listtable{\list\listtemplateid1\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{\*\levelmarker \{disc\}}{\leveltext\leveltemplateid1\'01\uc0\u8226 ;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid1}}
{\*\listoverridetable{\listoverride\listid1\listoverridecount0\ls1}}
\ls1\ilvl0
{\listtext  \uc0\u8226  }one\
{\listtext  \uc0\u8226  }two\
}

Note that \listtext contains only the fallback that "Should be ignored by any reader that understands Word 97 numbering".

But yeah, not sure whether all major RTF apps understand this well enough?

jgm commented 3 years ago

I imagine this style of list is pretty widely supported, but it's certainly more complicated to generate, since you have to sync up the identifiers in the list items with a stylesheet at the beginning of the document.