inspirehep / inspire-next

The INSPIRE repo.
https://inspirehep.net
GNU General Public License v3.0
59 stars 69 forks source link

DESYSpider: Abstract with <math> gets truncated #3482

Open ksachs opened 6 years ago

ksachs commented 6 years ago

In abstract (and title?) information with <math> gets truncated. Example: https://labs.inspirehep.net/holdingpen/1086582 which contains abstract

Applying advances in exact computations of supersymmetric gauge theories, we study the structure of correlation functions in two-dimensional <math altimg="si1.gif" display="inline" overflow="scroll"><mi mathvariant="sscript">N</mi><mo>=</mo><mo stretchy="false">(</mo><mn>2</mn><mo>,</mo><mn>2</mn>> <mo stretchy="false">)</mo></math> Abelian and non-Abelian gauge theories......
StellaCh commented 6 years ago

The team investigated this bug and it's not on INSPIRE side, the abstract has unescaped tags:

<subfield code="a">Applying advances in exact computations of supersymmetric gauge theories, we study the structure of correlation functions in two-dimensional <math altimg="si1.gif" display="inline" overflow="scroll"><mi mathvariant="script">N</mi><mo>=</mo><mo stretchy="false">(</mo><mn>2</mn><mo>,</mo><mn>2</mn><mo stretchy="false">)</mo></math> Abelian and non-Abelian gauge theories. We determine universal relations among correlation functions, which yield differential equations governing the dependence of the gauge theory ground state on the Fayet–Iliopoulos parameters of the gauge theory. For gauge theories with a non-trivial infrared <math altimg="si1.gif" display="inline" overflow="scroll"><mi mathvariant="script">N</mi><mo>=</mo><mo stretchy="false">(</mo><mn>2</mn><mo>,</mo><mn>2</mn><mo stretchy="false">)</mo></math> superconformal fixed point, these differential equations become the Picard–Fuchs operators governing the moduli-dependent vacuum ground state in a Hilbert space interpretation. For gauge theories with geometric target spaces, a quadratic expression in the Givental I -function generates the analyzed correlators. This gives a geometric interpretation for the correlators, their relations, and the differential equations. For classes of Calabi–Yau target spaces, such as threefolds with up to two Kähler moduli and fourfolds with a single Kähler modulus, we give general and universally applicable expressions for Picard–Fuchs operators in terms of correlators. We illustrate our results with representative examples of two-dimensional <math altimg="si1.gif" display="inline" overflow="scroll"><mi mathvariant="script">N</mi><mo>=</mo><mo stretchy="false">(</mo><mn>2</mn><mo>,</mo><mn>2</mn><mo stretchy="false">)</mo></math> gauge theories.</subfield>

Once you fix them on your side, it will appear correctly in the holdingpen.

ksachs commented 6 years ago

legacy is dealing with this.

We parse the xml through create_records. So our normal upload is already clean.

However, even batchupload with mathml directly works fine. Example on test (inspirevm16.cern.ch):

Task #1196844 Input file '/opt/cds-invenio/var/tmp-shared/batchupload_sachs_20180702101904_bCqr9U', input mode 'insert'. (I don't have permission to see that file, but I assume the mathml is still in.) Record 1673841 has no mathml

We thought the DESY spider would accept essentially the same xml stucture that could be harvested on legacy.

michamos commented 6 years ago

It seems to be truncated already after create_record:

In [1]: from dojson.contrib.marc21.utils import create_record

In [2]: create_record('''<datafield tag="245"><subfield code="a">Applying advances in exact computations of supersymmetric gauge theories, we study the structure of correlati
   ...: on functions in two-dimensional <math altimg="si1.gif" display="inline" overflow="scroll"><mi mathvariant="script">N</mi><mo>=</mo><mo stretchy="false">(</mo><mn>2</m
   ...: n><mo>,</mo><mn>2</mn><mo stretchy="false">)</mo></math> Abelian and non-Abelian gauge theories. We determine universal relations among correlation functions, which y
   ...: ield differential equations governing the dependence of the gauge theory ground state on the Fayet–Iliopoulos parameters of the gauge theory. For gauge theories with 
   ...: a non-trivial infrared <math altimg="si1.gif" display="inline" overflow="scroll"><mi mathvariant="script">N</mi><mo>=</mo><mo stretchy="false">(</mo><mn>2</mn><mo>,</
   ...: mo><mn>2</mn><mo stretchy="false">)</mo></math> superconformal fixed point, these differential equations become the Picard–Fuchs operators governing the moduli-depend
   ...: ent vacuum ground state in a Hilbert space interpretation. For gauge theories with geometric target spaces, a quadratic expression in the Givental I -function generat
   ...: es the analyzed correlators. This gives a geometric interpretation for the correlators, their relations, and the differential equations. For classes of Calabi–Yau tar
   ...: get spaces, such as threefolds with up to two Kähler moduli and fourfolds with a single Kähler modulus, we give general and universally applicable expressions for Pic
   ...: ard–Fuchs operators in terms of correlators. We illustrate our results with representative examples of two-dimensional <math altimg="si1.gif" display="inline" overflo
   ...: w="scroll"><mi mathvariant="script">N</mi><mo>=</mo><mo stretchy="false">(</mo><mn>2</mn><mo>,</mo><mn>2</mn><mo stretchy="false">)</mo></math> gauge theories.</subfi
   ...: eld></datafield>''')
Out[2]: 
GroupableOrderedDict([('__order__', ('245!!',)),
                      ('245!!',
                       GroupableOrderedDict([('__order__', ('a',)),
                                             ('a',
                                              'Applying advances in exact computations of supersymmetric gauge theories, we study the structure of correlation functions in two-dimensional ')]))])
ksachs commented 6 years ago

we will run xml-files for upload to labs via legacy create_records. So this is not blocking but should be solved. mathml will come from the publishers and labs will have to deal with it.

michamos commented 6 years ago

@ksachs not sure I understand what you mean about "we will run xml-files for upload to labs via legacy create_records". Wouldn't that prevent you from using labs completely for publisher harvests?

ksachs commented 6 years ago

We parse the xml via a stand-alone python program which is using our local installation of inspire-legacy. I.e. mis-useing invenio as a xml-parser. The modified xml (after deleting online-first articles etc.) is written to file and put on the ftp server to be harvested by labs.

from invenio.bibrecord import *
....
xmlrecords = xmlfile.read()
recs = create_records(xmlrecords,verbose=1)
xmlfile.close()

newxmlfile = codecs.EncodedFile(codecs.open(....,mode='wb'),'utf8')
newxmlfile.write('<?xml version="1.0" encoding="UTF-8"?>\n<collection>\n')
for recordtuple in recs:
    ...
    modify record
    newxmlfile.write(record_xml_output(record))
newxmlfile.write('</collection>\n')
newxmlfile.close()