opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
https://opendatalab.com/OpenSourceTools?tool=extract
GNU Affero General Public License v3.0
19.31k stars 1.38k forks source link

公式被识别为一级标题 #403

Open liqiankun1111 opened 3 months ago

liqiankun1111 commented 3 months ago

Description of the bug | 错误描述

pdf 原文 论文标注1)Gaseous species as reaction tracers in the solvothermal synthesis of the zinc oxide terephthalate MOF-5.pdf

被识别到的markdown片段,公式带上了一级标题标记

Because there is almost no peak overlap in the decoupled  $^{13}\mathrm{C}$   NMR measurements, we used these signals for the quantita- tive determination of amine and N-nitro so diethyl amine con- centrations. The solvent signal was used as an internal standard. The spectra were taken using inverse-gated proton decoupling, a   $30^{\circ}$   flip angle, and an averaging of 744 scans with a scanning rate of 20 s/scan. The results were confirmed by the corre- sponding    $^1\mathrm{H}$   NMR analysis. Because N-nitrosodiethylamine must result from the nitrate decomposition, it is of interest to compare its amount to the total nitrate loss. As can be seen from Table 1, the total nitrate loss during the MOF-5 synthesis is almost identical with the amount of N-nitro so diethyl amine formed. This observation fits to the fact that only a fairly small amount of nitrogen-containing species can be found in the gas- phase. Thus, one can conclude that most of the nitrogen resulting from the nitrate decomposition will ultimately stay in the liquid- phase and be bound by amine. It also means that out of the reactions Scheme   $\mathrm{2a-c}$  , 2a is most likely the dominating one, since   ${\mathrm{HNO}}_{2}$   is well-known to react with amines to N- nitrosoamines according to  

#  $\mathrm{HNet}_{2}+\mathrm{HNO}_{2}\longrightarrow\mathrm{Et}_{2}\mathrm{NNO}+\mathrm{H}_{2}\mathrm{O}$  

The most striking result taken from Table 1 is the reduction of the water content during the  $24\textrm{h}$   reaction time. The final concentration is only two-thirds of its initial value. At the same time, the relatively high concentration of all amine products allows us to estimate how much solvent was hydrolyzed (see Discussion). The loss of nitrate is relatively low and also does not match the production of all amine type reaction products.  

此外,解析后的md 论文引用部分没有按列表展示内容。

How to reproduce the bug | 如何复现

基于以上pdf 解析即可复现。

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.6.x

Device mode | 设备模式

cuda

myhloli commented 3 months ago

很有可能是黄色的标注底色影响了视觉模型的判断,这个公式被识别成了title区块。如果可以的话,建议拿没有黄色标注的pdf再试一下。