jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.03k stars 3.35k forks source link

Bold styles not flowing properly #6023

Closed scott-joe closed 4 years ago

scott-joe commented 4 years ago

Pandoc Version: 2.9.1 InCopy Version: 15.0.1

Description: I'm seeing an issue where the Bold "character style range" continues beyond it's intended end depending on whether it's followed by a Paragraph or Header. I've noticed some differences in the markup InCopy gives vs Pandoc (with and without a template).

Command: pandoc -s ./copy/markdown/00--styles-sample.md -f markdown -t icml -o ./copy/in-copy/00--styles-sample.icml

Markdown Input that flows Bold through the rest of the document when opened in InCopy

__This is bold text__

<!-- This is some random text that shouldn't be bold, but just a regular old paragraph -->

#### Blockquotes

> This is a blockquote

This is some random text that shouldn't be bold, but just a regular old paragraph

Pandoc Output

<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Paragraph">
  <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/Bold">
    <Content>This is bold text</Content>
  </CharacterStyleRange>
</ParagraphStyleRange>
<Br />
<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Header4">
  <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
    <Content>Blockquotes</Content>
  </CharacterStyleRange>
</ParagraphStyleRange>
<Br />
<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Blockquote &gt; Paragraph">
  <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
    <Content>This is a blockquote</Content>
  </CharacterStyleRange>
</ParagraphStyleRange>
<Br />
<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Paragraph">
  <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
    <Content>This is some random text that shouldn’t be bold, but just a regular old paragraph</Content>
  </CharacterStyleRange>
</ParagraphStyleRange>

(ignore the missing font highlights, that's an unrelated issue)

Screen Shot 2020-01-01 at 11 53 15 AM

Markdown Input that stops Bold at the intended paragraph.

__This is bold text__

This is some random text that shouldn't be bold, but just a regular old paragraph

#### Blockquotes

> This is a blockquote

This is some random text that shouldn't be bold, but just a regular old paragraph

Pandoc Output

<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Paragraph">
  <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/Bold">
    <Content>This is bold text</Content>
  </CharacterStyleRange>
</ParagraphStyleRange>
<Br />
<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Paragraph">
  <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
    <Content>This is some random text that shouldn’t be bold, but just a regular old paragraph</Content>
  </CharacterStyleRange>
</ParagraphStyleRange>
<Br />
<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Header4">
  <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
    <Content>Blockquotes</Content>
  </CharacterStyleRange>
</ParagraphStyleRange>
<Br />
<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Blockquote &gt; Paragraph">
  <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
    <Content>This is a blockquote</Content>
  </CharacterStyleRange>
</ParagraphStyleRange>
<Br />
<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Paragraph">
  <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
    <Content>This is some random text that shouldn’t be bold, but just a regular old paragraph</Content>
  </CharacterStyleRange>
</ParagraphStyleRange>

(ignore the missing font highlights, that's an unrelated issue)

Screen Shot 2020-01-01 at 11 51 58 AM Screen Shot 2020-01-01 at 11 52 03 AM

I also created a similar document in InCopy itself, and this is what it's output was:

<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/$ID/NormalParagraphStyle">
  <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/$ID/[No character style]">
    <Content>This is a header</Content>
    <Br />
  </CharacterStyleRange>
  <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/Bold">
    <Content>This is some bold text</Content>
    <Br />
  </CharacterStyleRange>
  <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/$ID/[No character style]">
    <Content>This is some normal text</Content>
  </CharacterStyleRange>
  <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/$ID/[No character style]" FontStyle="Bold">
    <Br />
  </CharacterStyleRange>
  <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/$ID/[No character style]">
    <Content>This is something else</Content>
  </CharacterStyleRange>
</ParagraphStyleRange>

I didn't bring along all the character and paragraph styles as I wanted to use just what it gave me, but the biggest difference I could see was in the InCopy applied a 'none' character style vs how Pandoc did. Paragraph styles typically have a "Normal" style applied as there's always a base style, but character styles have a "No Character Style" concept as they don't have to have one.

InCopy: <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/$ID/[No character style]"> Pandoc: <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">

What Pandoc spits out is very similar to how the Paragraph handles it's "none" in it's paragraph style definitions:

<ParagraphStyle Self="ParagraphStyle/Header3" Name="Header3" Imported="false" NextStyle="ParagraphStyle/Header3" SplitDocument="false" EmitCss="true" ExtendedKeyboardShortcut="0 0 0" EmptyNestedStyles="true" EmptyLineStyles="true" EmptyGrepStyles="true" KeyboardShortcut="0 0" PointSize="24" SpaceBefore="5.9976" SpaceAfter="5.9976">
  <Properties>
    <BasedOn type="object">ParagraphStyle/$ID/NormalParagraphStyle</BasedOn>
    <PreviewColor type="enumeration">Nothing</PreviewColor>
    <Leading type="unit">30</Leading>
  </Properties>
</ParagraphStyle>

The important line being: <BasedOn type="object">ParagraphStyle/$ID/NormalParagraphStyle</BasedOn> But when defining Character styles, ICML uses the expected pattern to define "none"

<CharacterStyle Self="CharacterStyle/UL2" Imported="false" SplitDocument="false" EmitCss="true" ExtendedKeyboardShortcut="0 0 0" KeyboardShortcut="0 0" Name="UL2" Ligatures="false" BaselineShift="-4" OTFContextualAlternate="false">
  <Properties>
    <BasedOn type="string">$ID/[No character style]</BasedOn>
    <PreviewColor type="enumeration">Nothing</PreviewColor>
  </Properties>
</CharacterStyle>

SO, I tried changing Pandoc's ICML output from: <CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle"> to <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/$ID/[No character style]"> and it the Bold styles stopped as expected.

I'm pretty new to Pandoc and ICML. I've been tinkering for about a week with leveraging the markup hierarchy found in Markdown to save time in InDesign by going through ICML first.

scott-joe commented 4 years ago

I also noticed that the <BR /> tags are on the outside of paragraph style blocks in the Pandoc output while they're inside on the InCopy output. If I were building a conversion tool, I'd consider it more useful to leave the <BR /> tags unstyled, but InDesign/InCopy has a–IMO–stupid pattern of leaving trailing paragraphs with the previous' styling. The idea is probably that you'd remove them all and leave the spacing to styles, but you don't get that in most text documents, so a lot of them end up having lines of text you have to remove manually... Just something else I thought I'd bring up as you can see the differences in the above samples

mb21 commented 4 years ago

Thanks for the feedback!

hm... I cannot reproduce the flowing boldness in InCopy 2020 (and neither in InDesign 2020 when placing the icml file). I used the first markdown snippet you posted to generate the attached file (which contains the same output you posted, with the addition of what's added by the template): foo.icml.zip

Are you sure you're not using a custom template? check also in your ~/.pandoc/templates directory. (The comment, which seems to be the only change in markdown you're reporting, shouldn't change the output in any way.)

image

About the [No character style] vs. NormalCharacterStyle... might be different InCopy versions... I honestly don't remember.

I also noticed that the <BR /> tags are on the outside of paragraph style blocks in the Pandoc output while they're inside on the InCopy output.

yes, this was done in 1ead1f39ad71086253ff6cb30d4462be642b4901 to fix #2501... I know it's a bit of a hack, but seems to work fine in practice.

scott-joe commented 4 years ago

Bold: You're right. There was a template in play. Thanks for looking into it. I'm more of a frontend dev guy, but I really love Pandoc as I got into development to make dealing with stuff like this easier :) Sorry for the bad post, but thanks again for looking into it.

CharacterStyle: It might have solved the issue in my broken situation, but it's clearly better to fix the real issue.

BR: 👍

Thanks again