Closed skirpichev closed 4 years ago
CSS does not effect to LaTeX output. You need to customize LaTeX output by LaTeX way (I don't know it). So I don't think this is a bug.
CSS does not effect to LaTeX output.
I know)
You need to customize LaTeX output by LaTeX way (I don't know it). So I don't think this is a bug.
Hmm, apparently, verbatim output is broken.
BTW, it seems, that ℯ
unicode symbol (SCRIPT SMALL E) in the code block was replaced by e
in the latex output. Is this not a bug?
About line spacing, the default settings of LaTeX is not zero. I don't think it's not a bug. Please modify it via your custom macro. You can modify line spacing with modifying \baselinestretch
or \baselineskip
.
For example, following setting modifies line spacing of whole of documents:
latex_elements = {
'preamble': r'\renewcommand{\baselinestretch}{0}'
}
Hmm, apparently, verbatim output is broken.
Please let me know in detail. How did you write input text in reStructuredText? It is very helpful for us to fill our issue template. Thanks,
You can modify line spacing with modifying \baselinestretch
Thank you! That way it looks much better now.
Yet, I see a problem with the mentioned above case when "SCRIPT SMALL E" was replaced with "e" in the latex output. See screenshot (with adapted baselinestretch):
Please let me know in detail. How did you write input text in reStructuredText?
Sorry, I expected that this was clear from the bugreport. The referenced above docs page has "edit on Github" button with the full access to the intro.rst source. As you can see, in the dsolve()
output example - the "SCRIPT SMALL E" unicode symbol was used for the base of natural logarithms. But sphinx-doc replaces this with "e" in the diofant.tex. Apparently, this breaks verbatim formatting in this case.
Hmm, maybe subscripts do break formatting in this case. If I replace C_1
and C_2
with unicode α/β:
Please explain how the math formulas are generated, as Sphinx does not use LaTeX math mode in code-blocks. Code-blocks pile up lines of identical heights with no extra interline white-space. But then the elements used for the formulas for example the middle part of a parenthesis which I gather is divided into three parts must use the full height in the TeX hbox
.
Don't expect I will go a on clik on me chase to fetch your source, I have other duties. Thanks.
Please explain how the math formulas are generated
Why it does matter? This is just a unicode text, I can paste it right here, no problem:
Solve the differential equation `f'' - f = e^x`.
>>> f = symbols('f', cls=Function)
>>> dsolve(Eq(f(x).diff(x, 2) - f(x), exp(x)), f(x))
x ⎛ x⎞ -x
f(x) = ℯ ⋅⎜C₂ + ─⎟ + ℯ ⋅C₁
⎝ 2⎠
I did not express myself precisely when I asked "how they are generated", I only meant how they end up being fetched to Sphinx document generation. So here you have Unicode text.
It is thus no surprise that Sphinx verbatim rendering has problems: LaTeX code-blocks will put successive lines into horizontal so-called boxes of the same height and pile them one on top of the other. You thus need your glyphs of vertical extent to all have the exact same height and depth and to tell LaTeX to use that.
The solution via \baselinestretch
set to 0
means that a LaTeX fundamental layout parameter \baselineskip
is set to zero, which will ruin paragraph building for running text. For code-blocks, it apparently helps in your problem, because as \baselineskip
is zero, the so-called \strutbox
has zero dimensions (it is reset at each font size change by LaTeX and code-blocks use \small
). This means that the automatically added \strut
in each code-block line serves to nothing and each line takes the height of its highest ascender and the depth of its deepest descender.
It is thus not a real solution to the problem as subscripts and superscripts for example may offset that.
Sphinx does escape Unicode subscripts see https://github.com/sphinx-doc/sphinx/blob/618cc26c6463b1ad29a1568bfd123b016fc870c4/sphinx/util/texescape.py#L49-L68
This dates back to an era where Sphinx PDF output did not support xelatex. The LaTeX mark-up means that Unicode is replaced by TeX math.
As per the Unicode ℯ
, yes it is escaped by Sphinx at https://github.com/sphinx-doc/sphinx/blob/618cc26c6463b1ad29a1568bfd123b016fc870c4/sphinx/util/texescape.py#L47
Surely some revision of legacy TeX escaping is in order for xelatex engine. In the past it was even worse as all Greek letters were escaped to TeX math mode mark-up.
At time being you can monkey-patch the indicated parts of Sphinx code.
\baselineskip is set to zero, which will ruin paragraph building for running text
Maybe it does make sense to restrict similar settings to the "sphinxVerbatim" environment?
(BTW, I think this problem alone is generic enough to make this bug - labeled as bug, not a support question.)
Sphinx does escape Unicode subscripts see
Hmm, removing escapes for 1/2 subscripts does work for me. Will you accept patch, which drops these subscript/superscript escapes?
I think we have two quite separate issues here:
the first one is with TeX escaping of Unicode characters such as SCRIPT SMALL E which I agree is a bug when then latex engine is xelatex or lualatex,
the second one is more specific and a bit hard to categorize exactly which is to support multi-line glyph alignments via some sort of Unicode Art, which can only work with specific fonts.
About the first one, could you please open a specific issue with a title such as "LaTeX for PDF via xelatex should not escape Unicode like done for pdflatex engine" or one to your liking. Patch welcome of course but effect should be limited to xelatex/lualatex and a more complete examination of tex_replacements
is needed than only subscript/superscript escapes.
About second one, yes one must/can limit changes to only contents of sphinxVerbatim, I see potential difficulties with glyph depths, but possibly the fonts usable for such Unicode Art could fare well with simply letting TeX set the height (and depth) of baseline to the actual encountered contents. However it might be better to let TeX fit the baseline to the ⎟
glyph. I will think about that, hopefully a custom LaTeX preamble in conf.py can achieve it via some LaTeX hack.
I have heavily edited this post because I got confused both in my testing and actual glyph dimensions I reported.
First approach (does not quite work):
latex_engine = 'xelatex'
latex_elements = {
'fontpkg': r'''
\setmainfont{DejaVu Serif}
\setsansfont{DejaVu Sans}
\setmonofont{DejaVu Sans Mono}
''',
'preamble': r'''
\fvset{formatcom=\def\strut{\vphantom{⎟}}}
''',
}
Does it work for you (with the sub/superscripts non escaped)? (edit: surely not quite)
It is a bit drastic LaTeX measure (but is limited to sphinxVerbatim
), and a somewhat better one would be to redefine \sphinxVerbatimFormatLineWrap
and \sphinxVerbatimFormatLineNoWrap
and replace therein occurrences of \strut
, but it looks more bulky.
edit on further investigation, this appears to work because the dimensions of the ⎟
glyph (height plus depth 8.47705pt and 5.41846pt) exceed the locally used \baselineskip
(11.0pt, set by \small
done by default by sphinx Verbatim).
In fact the above does not quite work because TeX uses locally in sphinxVerbatim a \baselineskip
of 11.0pt, but the height and depth for ⎟
glyph in DejaVu Sans Mono are 8.47705pt
, 2.26758pt
for a total of 10.74463pt
(TeX points).
Second approach. Same font set-up but this time we use this \fvset
:
latex_elements = {
'preamble': r'''
\fvset{formatcom=\baselineskip0pt\relax\let\strut\empty}
''',
}
This has the same effect as the \baselinestretch
approach but is localized to sphinxVerbatim
.
But if we have this in source
---
+++
the output will give two narrow lines. This approach assigns to each line its minimal vertical space as determined by the contents (as seen by XeTeX from font data).
Can you try this in your conf.py ... \fvset{formatcom=\def\strut{\vphantom{⎟}}}
Yes, that does work (together with monkey-patching del sphinx.util.texescape.tex_replacements[30:]
) and seems more targeted. Thank you!
\fvset{formatcom=\baselineskip0pt\relax\let\strut\empty}
This does work too. Not sure which is better.
Thanks for reporting back.
The first approach (with \vphantom
) maintains a minimal baseline height + depth equal to the space occupied by ⎟
. If the line contains higher ascenders or lower descenders it will get enlarged vertically to accommodate them, but if it contains only things such as +-
which occupy less vertical space it will nevertheless adopt the space which would be needed by a ⎟
.
The second approach \fvset{formatcom=\baselineskip0pt\relax\let\strut\empty}
on the other hand is a "cramped" style where all lines adopt the minimal necessary vertical space according to their glyph contents and the dimensions XeTeX associates with same from font file data.
I did not test fully the above two advices (e.g. does it impact line numbers if the feature is on?) As I said there is a more bulky way to do the same by redefining some sphinx LaTeX macros, which would be a safer way, but if it works like this, no reason to do something more complicated.
I am interested in further feedback in future about which one is best in your complete real life examples.
About the monkey-patch you currently must do, in future Sphinx will adopt a better way to handle these Unicode codepoints when latex engine is xelatex, so you should not need it in future.
By the way, don't you have a problem with SCRIPT SMALL E? (U+212F) I get this if I insert it directly in a sphinxVerbatim environment
Missing character: There is no ℯ in font DejaVu Sans Mono/OT:script=latn;langua
ge=DFLT;!
and the PDF shows a rectangle in its place.
I am interested in further feedback in future about which one is best in your complete real life examples.
I'll try the first option. It (especially after your explanations) looks like a minimal workaround. I see no visible issues with this approach right now.
By the way, don't you have a problem with SCRIPT SMALL E? (U+212F)
Well, this symbol still replaced by "e" (yes, it's missing in DejaVu fonts). I thought this was the reason that pretty-printed block lost its formatting, but it's not.
So, using e
seems ok, maybe I even should use this symbol instead of "SCRIPT SMALL E" for unicode pretty-printing... DejaVu Monospace fonts seems to be the only option, which provide maximal number of unicode characters for math.
Ok, I'll close this (very helpful) issue as a support request and the open two: 1) about line spacing in the code blocks and 2) about filtering some unicode characters.
@skirpichev At some locations in my comment above, I got confused about some things during testing (I confused the width for the depth regarding the glyph ⎜
) and I will edit them at some point later. At this stage the first option seems to work but I don't fully understand why, it should not work so well. I will ping you when I will have found time to disentangle my state of confusion and will have polished my earlier comments.
Rather than modifying now again my earlier comments I try to clarify the situation as I understand it.
under default context, sphinxVerbatim tries to achieve constant distance between baselines equal to 11pt.
the reason extra vertical whitespace appears in the OP's case is that for example the ⎜
(or the ⎟
which is another one...) has a height of 8.47705pt
. But each line contains a \strut
which inserts an invisible ascender of 7.7pt
and a descender of 3.3pt
. As 8.47705pt + 3.3pt > 11pt
a gap was visible.
the method to do \fvset{formatcom=\def\strut{\vphantom{⎟}}}
means to replace the default \strut
with one modelled on ⎟
. But as its depth is 2.26758pt
the total height+depth is less than 11pt
and there should remain small gap. I believe PDF viewer anti-aliasing contributes to make it not so visible. How it looks like depends on the zooming level in Skim.app on mac os x. Besides what matters is the depth of last line plus the height of next line, so it is not only the single glyph ⎟
.
the method I would recommend is to get the \baselineskip
to be set at a value less than total height plus depth of glyphs such as ⎟
. For example one can put it to 0pt
but then some lines will get cramped. Thus perhaps 10pt
. But it is also needed to remove the effect described above of the \strut
which its large depth of 3.3pt
else we are back to initial condition. Thus something like
\fvset{formatcom=\baselineskip10pt\relax\let\strut\empty}
But, this method has not been fully tested. For people investigating this note that total height plus depth of ⎟
is not the same as for ⎠
for example. This all refers to Deja Vu Sans Mono in \small
size. The dimensions use "TeX" points.
In the diofant docs I use xelatex engine and following preamble:
HTML output from the sphinx looks fine (see e.g. this, an example of
solve()
usage) after slight css adaptationBut in the PDF output there are additional vertical spacing (like one, removed by line-height directive in the HTML output). See screenshot from the project issue https://github.com/diofant/diofant/issues/754.