kermitt2 / grobid

A machine learning software for extracting information from scholarly documents
https://grobid.readthedocs.io
Apache License 2.0
3.59k stars 459 forks source link

Abstract Being Cutoff #473

Closed DavidBegert closed 2 years ago

DavidBegert commented 5 years ago

Hi there, here is an example pdf that for some reason Grobid cuts off the abstract after the first sentence. Any idea why? PDF: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3803163/pdf/nihms-516461.pdf

Thanks! :)

Vitaliy-1 commented 5 years ago

Although abstract is recognized correctly for this PDF with createTraining for header and segmentation model, HeaderParser.resultExtraction() identifies the next token after the end of that sentence (Its) as I-<intro>, stops parsing and returns the result.

Vitaliy-1 commented 5 years ago

Just want to add that I'm using segmentation model trained with 500+ documents, so it should better work with discriminating between the front and body than the default. Maybe, it's somehow related to: https://github.com/kermitt2/grobid/issues/430#issuecomment-519027255

kermitt2 commented 4 years ago

It's working fine now after the update of the header process & segmentation model.

<profileDesc>
            <textClass>
                <keywords>
                    <term>Abatacept</term>
                    <term>type 1 diabetes mellitus</term>
                    <term>tetanus vaccine</term>
                    <term>influenza vaccine</term>
                </keywords>
            </textClass>
            <abstract>
                <p>Abatacept delayed progression of type 1 diabetes (T1D) when administered soon after diagnosis. Its use in T1D is expanding to prevention trials and, therefore, it is important to fully characterize its immunosuppressive effect. We compared antibody responses to trivalent inactivated influenza vaccine (TIIV) administered during 2 consecutive seasons and to tetanus toxoid (TT) vaccine administered after 24 months of treatment in115 early onset T1D subjects randomly assigned to 24 months of Abatacept (N=71) or placebo (N=34). Anti-influenza titers before TIIV were similar between the 2 treatment groups and both groups had significant increases after vaccination. Although the magnitude of antibody responses against some influenza serotypes was significantly lower (p&lt;0.05) in Abatacept compared with placebo recipients, no differences were observed in the proportion of subjects with protective titers against influenza after vaccination. The magnitude of antibody responses against TT also tended to be lower (p=0.06) in Abatacept compared with placebo recipients, without affecting the proportion of subjects who achieved protective titers. We conclude that Abatacept moderately decreases the magnitude of antibody responses to recall vaccination. Further studies are needed to assess its effect on primary immunization.</p>
            </abstract>
        </profileDesc>