Open ziyuang opened 2 weeks ago
It looks like obsidian-clipper uses @mozilla/readability, but I don't see huge problems in Firefox's Reader View (which uses the same library):
The problem is in the conversion to Markdown not in Readability
So it looks for the <math>
node and converts the node to LaTeX expression with mathml-to-latex:
const mathElement = assistiveMml.querySelector('math');
if (!mathElement) {
return content;
}
let latex;
try {
latex = MathMLToLaTeX.convert(mathElement.outerHTML);
} catch (error) {
console.error('Error converting MathML to LaTeX:', error);
return content;
}
For example, the <math>
node for the first equation in the page looks like this
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
<mrow class="MJX-TeXAtom-ORD">
<mfrac>
<mrow> <mi>d</mi> <mi>P</mi> </mrow> <mrow> <mi>d</mi> <mi>μ</mi> </mrow>
</mfrac>
</mrow>
<mo>∈</mo>
<msub>
<mi>L</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>1</mn> </mrow>
</msub>
<mo stretchy="false">(</mo> <mi>μ</mi> <mo stretchy="false">)</mo>
</math>
I would use this as a plan B, because sometimes the corresponding LaTeX expression appears in a nearby <script>
node. In this case, it is
<script type="math/tex; mode=display" id="MathJax-Element-6">
{ \frac{dP }{d \mu } } \in L _ {1} ( \mu )
</script>
Oh, does Readability strip off the script block already?
On this page (using MathJax 3), there is a related, but slightly different, issue. Here, the math expressions are ignored entirely.
On this page (using MathJax 3), there is a related, but slightly different, issue. Here, the math expressions are ignored entirely.
Also the figures are broken. For example an image <img src="DDPM.png" style="width: 100%;" class="center">
is converted to ![](https://lilianweng.github.io/DDPM.png)
, but in fact it should be ![](https://lilianweng.github.io/posts/2021-07-11-diffusion-models/DDPM.png)
On this page (using MathJax 3), there is a related, but slightly different, issue. Here, the math expressions are ignored entirely.
I also tried with mathml-to-latex
's playground (better change v1.3.0 to v1.4.2). The <math>
blocks in the page are convertible to LaTeX.
Maybe something upstream (Readability?) is off.
Version (please complete the following information):
Describe the bug Obsidian-clipper doesn't extract LaTeX expressions from a webpage well.
To reproduce
Expected behavior The LaTeX expressions are saved and the math content are correctly rendered. For the block between b) and c) in the page, this will be
or
Actual behavior The expression below is saved:
or
The main problem is that \left{ should have been \left\{. There are also minor issues
Your template file default-clipper.json obsidian-web-clipper-settings.json