kermitt2 / grobid

A machine learning software for extracting information from scholarly documents
https://grobid.readthedocs.io
Apache License 2.0
3.3k stars 439 forks source link

Wrong recognition of Title as Abstract section #785

Open MedKhem opened 3 years ago

MedKhem commented 3 years ago

We couldn't properly extract the title of the attached Italian paper from the TEI output, as it has been recognized as an abstract section 14Tornetta.pdf

kermitt2 commented 3 years ago

Interestingly, well maybe not that much, but with BidLSTM-CRF-FEATURES header model, it is working.

(just note that there is no particular treatment of the English title, or of a subtitle, it's something still to be done)

<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve"
    xmlns="http://www.tei-c.org/ns/1.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
    xmlns:xlink="http://www.w3.org/1999/xlink">
    <teiHeader xml:lang="en">
        <fileDesc>
            <titleStmt>
                <title level="a" type="main">La spasticità: il trattamento farmacologico. L&apos;uso della tossina botulinica e trattamento riabilitativo specifico * The spasticity: farmacological treatment. The use of the botulin toxin and specific rehabilitative treatment</title>
            </titleStmt>
            <publicationStmt>
                <publisher/>
                <availability status="unknown">
                    <licence/>
                </availability>
            </publicationStmt>
            <sourceDesc>
                <biblStruct>
                    <analytic>
                        <author role="corresp">
                            <persName>
                                <forename type="first">Lorella</forename>
                                <surname>Tornetta</surname>
                            </persName>
                            <email>lorella.tornetta@torton.net</email>
                        </author>
                        <author>
                            <persName>
                                <forename type="first">M</forename>
                                <surname>Martielli</surname>
                            </persName>
                        </author>
                        <author>
                            <persName>
                                <forename type="first">R</forename>
                                <surname>Cartello</surname>
                            </persName>
                        </author>
                        <author>
                            <persName>
                                <forename type="first">M</forename>
                                <surname>Melillo</surname>
                            </persName>
                        </author>
                        <author>
                            <persName>
                                <forename type="first">L</forename>
                                <surname>Obino</surname>
                            </persName>
                        </author>
                        <author>
                            <persName>
                                <forename type="first">L</forename>
                                <surname>Clarici</surname>
                            </persName>
                        </author>
                        <author>
                            <persName>
                                <forename type="first">T</forename>
                                <surname>Borro</surname>
                            </persName>
                        </author>
                        <author>
                            <persName>
                                <forename type="first">B</forename>
                                <surname>Bassi</surname>
                            </persName>
                        </author>
                        <author>
                            <persName>
                                <forename type="first">R</forename>
                                <surname>Rigardetto</surname>
                            </persName>
                        </author>
                        <author>
                            <persName>
                                <roleName>M.-S</roleName>
                                <forename type="first">O</forename>
                                <forename type="middle">I R</forename>
                                <surname>Anna</surname>
                            </persName>
                        </author>
                        <author>
                            <affiliation key="aff0">
                                <orgName type="department">Dipartimento a Direzione</orgName>
                                <orgName type="institution">Universitaria di NPI</orgName>
                                <address>
                                    <region>A.S.O</region>
                                </address>
                            </affiliation>
                        </author>
                        <author>
                            <affiliation key="aff1">
                                <orgName type="department">PAROLE CHIAVE. -Spasticità -Tossina</orgName>
                                <orgName type="institution">Università di Torino</orgName>
                            </affiliation>
                        </author>
                        <author>
                            <affiliation key="aff2">
                                <orgName type="department">Dipartimento di NPI</orgName>
                                <address>
                                    <region>A.S.O. O.I</region>
                                </address>
                            </affiliation>
                        </author>
                        <author>
                            <affiliation key="aff3">
                                <orgName type="laboratory">Comunicazione svolta al Corso Satellite su &quot;Riabilitazione e Trattamento Farmacologico nei disturbi neuropsichici del bambino&quot;, a cura della Sezione di Riabilitazione della SINPIA. Napoli</orgName>
                                <address>
                                    <addrLine>7-10</addrLine>
                                    <postCode>2005</postCode>
                                    <settlement>Dicembre</settlement>
                                </address>
                            </affiliation>
                        </author>
                        <title level="a" type="main">La spasticità: il trattamento farmacologico. L&apos;uso della tossina botulinica e trattamento riabilitativo specifico * The spasticity: farmacological treatment. The use of the botulin toxin and specific rehabilitative treatment</title>
                    </analytic>
                    <monogr>
                        <imprint>
                            <date/>
                        </imprint>
                    </monogr>
                    <idno type="MD5">67F5852E1599BBBBF6CE183894D0BCA1</idno>
                </biblStruct>
            </sourceDesc>
        </fileDesc>
        <encodingDesc>
            <appInfo>
                <application version="0.7.0-SNAPSHOT" ident="GROBID" when="2021-06-29T13:00+0000">
                    <desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
                    <ref target="https://github.com/kermitt2/grobid"/>
                </application>
            </appInfo>
        </encodingDesc>
        <profileDesc>
            <textClass>
                <keywords>botulinica -Trattamento riabilitativo -Paralisi cerebrale infantile (PCI) KEY WORDS. -Spasticity -Botulin toxin -Rehabilitative treatment -Cerebral palsy</keywords>
            </textClass>
            <abstract>
                <p>Introduction. The spasticity is a complex movement disorder in which several neurophysiological circuits are involved. This complexity added to the rheological muscle&apos;s modifications make it difficult to understand the functional meaning of the symptom. In cerebral palsy, in fact, spasticity can also be seen as positive factor and used in a functional manner. Before choosing a therapeutic strategy it is important to understand the functional meaning of spasticity for that child. Nowadays pharmacological treatments include systemic, intratecal and local approaches. Methods and Materials. Our experience is focused on the use of the botulin toxin. We explain clinical assessment and protocol for post-inoculation on rehabilitative treatment. Results. Results vary depending on the severity of the disability and on the age of patients, on a correct identification of muscles to inoculate, on an adequate dose of toxin and most of all on post-inoculation rehabilitative treatment. Conclusions. In our experience post-inoculation rehabilitative treatment can influence the outcome especially from a functional point of view. Nevertheless, there aren&apos;t enough statistical evidence yet either to confirm or to refute advantages of using BT-A in cerebral palsy.</p>
            </abstract>
        </profileDesc>
    </teiHeader>
    <text xml:lang="en"></text>
</TEI>
MedKhem commented 3 years ago

Does the BidLSTM-CRF-FEATURES model use also the layout features?

kermitt2 commented 3 years ago

Does the BidLSTM-CRF-FEATURES model use also the layout features?

Yes as its name suggest. Actually without layout features, the BidLSTM-CRF header model is not effective at all, around 10 points less in F1-score if I remember well.