sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.48k stars 2.11k forks source link

Additional vertical spacing in sphinxVerbatim environment (latex output) #6733

Closed skirpichev closed 4 years ago

skirpichev commented 5 years ago

In the diofant docs I use xelatex engine and following preamble:

\setmainfont{DejaVu Serif}
\setsansfont{DejaVu Sans}
\setmonofont{DejaVu Sans Mono}

HTML output from the sphinx looks fine (see e.g. this, an example of solve() usage) after slight css adaptation

.rst-content div[class^="highlight"] pre {
    line-height: normal;
    font-family: 'DejaVu Sans Mono', 'DejaVuSansMonoBook', 'DejaVuSansMonoBookFallback', monospace;
}

But in the PDF output there are additional vertical spacing (like one, removed by line-height directive in the HTML output). See screenshot from the project issue https://github.com/diofant/diofant/issues/754.

tk0miya commented 5 years ago

CSS does not effect to LaTeX output. You need to customize LaTeX output by LaTeX way (I don't know it). So I don't think this is a bug.

skirpichev commented 5 years ago

CSS does not effect to LaTeX output.

I know)

You need to customize LaTeX output by LaTeX way (I don't know it). So I don't think this is a bug.

Hmm, apparently, verbatim output is broken.

BTW, it seems, that unicode symbol (SCRIPT SMALL E) in the code block was replaced by e in the latex output. Is this not a bug?

tk0miya commented 4 years ago

About line spacing, the default settings of LaTeX is not zero. I don't think it's not a bug. Please modify it via your custom macro. You can modify line spacing with modifying \baselinestretch or \baselineskip. For example, following setting modifies line spacing of whole of documents:

latex_elements = {
    'preamble': r'\renewcommand{\baselinestretch}{0}'
}

Hmm, apparently, verbatim output is broken.

Please let me know in detail. How did you write input text in reStructuredText? It is very helpful for us to fill our issue template. Thanks,

skirpichev commented 4 years ago

You can modify line spacing with modifying \baselinestretch

Thank you! That way it looks much better now.

Yet, I see a problem with the mentioned above case when "SCRIPT SMALL E" was replaced with "e" in the latex output. See screenshot (with adapted baselinestretch): 1

Please let me know in detail. How did you write input text in reStructuredText?

Sorry, I expected that this was clear from the bugreport. The referenced above docs page has "edit on Github" button with the full access to the intro.rst source. As you can see, in the dsolve() output example - the "SCRIPT SMALL E" unicode symbol was used for the base of natural logarithms. But sphinx-doc replaces this with "e" in the diofant.tex. Apparently, this breaks verbatim formatting in this case.

skirpichev commented 4 years ago

Hmm, maybe subscripts do break formatting in this case. If I replace C_1 and C_2 with unicode α/β: 1

jfbu commented 4 years ago

Please explain how the math formulas are generated, as Sphinx does not use LaTeX math mode in code-blocks. Code-blocks pile up lines of identical heights with no extra interline white-space. But then the elements used for the formulas for example the middle part of a parenthesis which I gather is divided into three parts must use the full height in the TeX hbox.

Don't expect I will go a on clik on me chase to fetch your source, I have other duties. Thanks.

skirpichev commented 4 years ago

Please explain how the math formulas are generated

Why it does matter? This is just a unicode text, I can paste it right here, no problem:

Solve the differential equation `f'' - f = e^x`.

   >>> f = symbols('f', cls=Function)
   >>> dsolve(Eq(f(x).diff(x, 2) - f(x), exp(x)), f(x))
           x ⎛     x⎞    -x
   f(x) = ℯ ⋅⎜C₂ + ─⎟ + ℯ  ⋅C₁
             ⎝     2⎠
jfbu commented 4 years ago

I did not express myself precisely when I asked "how they are generated", I only meant how they end up being fetched to Sphinx document generation. So here you have Unicode text.

It is thus no surprise that Sphinx verbatim rendering has problems: LaTeX code-blocks will put successive lines into horizontal so-called boxes of the same height and pile them one on top of the other. You thus need your glyphs of vertical extent to all have the exact same height and depth and to tell LaTeX to use that.

The solution via \baselinestretch set to 0 means that a LaTeX fundamental layout parameter \baselineskip is set to zero, which will ruin paragraph building for running text. For code-blocks, it apparently helps in your problem, because as \baselineskip is zero, the so-called \strutbox has zero dimensions (it is reset at each font size change by LaTeX and code-blocks use \small). This means that the automatically added \strut in each code-block line serves to nothing and each line takes the height of its highest ascender and the depth of its deepest descender.

It is thus not a real solution to the problem as subscripts and superscripts for example may offset that.

Sphinx does escape Unicode subscripts see https://github.com/sphinx-doc/sphinx/blob/618cc26c6463b1ad29a1568bfd123b016fc870c4/sphinx/util/texescape.py#L49-L68

This dates back to an era where Sphinx PDF output did not support xelatex. The LaTeX mark-up means that Unicode is replaced by TeX math.

As per the Unicode , yes it is escaped by Sphinx at https://github.com/sphinx-doc/sphinx/blob/618cc26c6463b1ad29a1568bfd123b016fc870c4/sphinx/util/texescape.py#L47

Surely some revision of legacy TeX escaping is in order for xelatex engine. In the past it was even worse as all Greek letters were escaped to TeX math mode mark-up.

At time being you can monkey-patch the indicated parts of Sphinx code.

skirpichev commented 4 years ago

\baselineskip is set to zero, which will ruin paragraph building for running text

Maybe it does make sense to restrict similar settings to the "sphinxVerbatim" environment?

(BTW, I think this problem alone is generic enough to make this bug - labeled as bug, not a support question.)

Sphinx does escape Unicode subscripts see

Hmm, removing escapes for 1/2 subscripts does work for me. Will you accept patch, which drops these subscript/superscript escapes?

jfbu commented 4 years ago

I think we have two quite separate issues here:

About the first one, could you please open a specific issue with a title such as "LaTeX for PDF via xelatex should not escape Unicode like done for pdflatex engine" or one to your liking. Patch welcome of course but effect should be limited to xelatex/lualatex and a more complete examination of tex_replacements is needed than only subscript/superscript escapes.

About second one, yes one must/can limit changes to only contents of sphinxVerbatim, I see potential difficulties with glyph depths, but possibly the fonts usable for such Unicode Art could fare well with simply letting TeX set the height (and depth) of baseline to the actual encountered contents. However it might be better to let TeX fit the baseline to the glyph. I will think about that, hopefully a custom LaTeX preamble in conf.py can achieve it via some LaTeX hack.

jfbu commented 4 years ago

I have heavily edited this post because I got confused both in my testing and actual glyph dimensions I reported.

First approach (does not quite work):

latex_engine = 'xelatex'

latex_elements = {
    'fontpkg': r'''
\setmainfont{DejaVu Serif}
\setsansfont{DejaVu Sans}
\setmonofont{DejaVu Sans Mono}
''',
    'preamble': r'''
\fvset{formatcom=\def\strut{\vphantom{⎟}}}
''',
}

Does it work for you (with the sub/superscripts non escaped)? (edit: surely not quite)

It is a bit drastic LaTeX measure (but is limited to sphinxVerbatim), and a somewhat better one would be to redefine \sphinxVerbatimFormatLineWrap and \sphinxVerbatimFormatLineNoWrap and replace therein occurrences of \strut, but it looks more bulky.

edit on further investigation, this appears to work because the dimensions of the glyph (height plus depth 8.47705pt and 5.41846pt) exceed the locally used \baselineskip (11.0pt, set by \small done by default by sphinx Verbatim).

In fact the above does not quite work because TeX uses locally in sphinxVerbatim a \baselineskip of 11.0pt, but the height and depth for glyph in DejaVu Sans Mono are 8.47705pt, 2.26758pt for a total of 10.74463pt (TeX points).

Second approach. Same font set-up but this time we use this \fvset:

latex_elements = {
    'preamble': r'''
\fvset{formatcom=\baselineskip0pt\relax\let\strut\empty}
''',
}

This has the same effect as the \baselinestretch approach but is localized to sphinxVerbatim.

But if we have this in source

   ---
   +++

the output will give two narrow lines. This approach assigns to each line its minimal vertical space as determined by the contents (as seen by XeTeX from font data).

skirpichev commented 4 years ago

Can you try this in your conf.py ... \fvset{formatcom=\def\strut{\vphantom{⎟}}}

Yes, that does work (together with monkey-patching del sphinx.util.texescape.tex_replacements[30:]) and seems more targeted. Thank you!

\fvset{formatcom=\baselineskip0pt\relax\let\strut\empty}

This does work too. Not sure which is better.

jfbu commented 4 years ago

Thanks for reporting back.

The first approach (with \vphantom) maintains a minimal baseline height + depth equal to the space occupied by . If the line contains higher ascenders or lower descenders it will get enlarged vertically to accommodate them, but if it contains only things such as +- which occupy less vertical space it will nevertheless adopt the space which would be needed by a .

The second approach \fvset{formatcom=\baselineskip0pt\relax\let\strut\empty} on the other hand is a "cramped" style where all lines adopt the minimal necessary vertical space according to their glyph contents and the dimensions XeTeX associates with same from font file data.

I did not test fully the above two advices (e.g. does it impact line numbers if the feature is on?) As I said there is a more bulky way to do the same by redefining some sphinx LaTeX macros, which would be a safer way, but if it works like this, no reason to do something more complicated.

I am interested in further feedback in future about which one is best in your complete real life examples.

About the monkey-patch you currently must do, in future Sphinx will adopt a better way to handle these Unicode codepoints when latex engine is xelatex, so you should not need it in future.

jfbu commented 4 years ago

By the way, don't you have a problem with SCRIPT SMALL E? (U+212F) I get this if I insert it directly in a sphinxVerbatim environment

Missing character: There is no ℯ in font DejaVu Sans Mono/OT:script=latn;langua
ge=DFLT;!

and the PDF shows a rectangle in its place.

skirpichev commented 4 years ago

I am interested in further feedback in future about which one is best in your complete real life examples.

I'll try the first option. It (especially after your explanations) looks like a minimal workaround. I see no visible issues with this approach right now.

By the way, don't you have a problem with SCRIPT SMALL E? (U+212F)

Well, this symbol still replaced by "e" (yes, it's missing in DejaVu fonts). I thought this was the reason that pretty-printed block lost its formatting, but it's not.

So, using e seems ok, maybe I even should use this symbol instead of "SCRIPT SMALL E" for unicode pretty-printing... DejaVu Monospace fonts seems to be the only option, which provide maximal number of unicode characters for math.

Ok, I'll close this (very helpful) issue as a support request and the open two: 1) about line spacing in the code blocks and 2) about filtering some unicode characters.

jfbu commented 4 years ago

@skirpichev At some locations in my comment above, I got confused about some things during testing (I confused the width for the depth regarding the glyph ) and I will edit them at some point later. At this stage the first option seems to work but I don't fully understand why, it should not work so well. I will ping you when I will have found time to disentangle my state of confusion and will have polished my earlier comments.

jfbu commented 4 years ago

Rather than modifying now again my earlier comments I try to clarify the situation as I understand it.

\fvset{formatcom=\baselineskip10pt\relax\let\strut\empty}

But, this method has not been fully tested. For people investigating this note that total height plus depth of is not the same as for for example. This all refers to Deja Vu Sans Mono in \small size. The dimensions use "TeX" points.