kermitt2 / grobid

A machine learning software for extracting information from scholarly documents
https://grobid.readthedocs.io
Apache License 2.0
3.5k stars 449 forks source link

Grobid fails to extract abstract? #319

Closed joelkuiper closed 1 year ago

joelkuiper commented 6 years ago

This document produces no abstract (i.e. the text is simply not there in the XML), is there something that can be done on our end to fix this issue?

https://doi.org/10.1161/CIRCULATIONAHA.108.839639

21076158.pdf

joelkuiper commented 6 years ago

Similar case where entire pieces of text are missing: Hemodynamic Patterns of Age-Related Changes in Blood Pressure | Circulation.pdf

kermitt2 commented 4 years ago

Testing after the update of the header processing and model:

            <abstract>
                <div
                    xmlns="http://www.tei-c.org/ns/1.0">
                    <p>Background-We previously demonstrated that treatment with antiarrhythmic drugs (AADs) during the first 6 weeks after atrial fibrillation (AF) ablation reduces the incidence of clinically significant atrial arrhythmias and need for cardioversion or hospitalization for arrhythmia management. Whether early rhythm suppression decreases longer-term arrhythmia recurrence is unknown. We now report the 6-month follow-up data from this study.</p>
                </div>
                <div
                    xmlns="http://www.tei-c.org/ns/1.0">
                    <head>Methods and Results-The Antiarrhythmics After Ablation of Atrial Fibrillation study prospectively randomized patients</head>
                    <p>with paroxysmal AF undergoing ablation to either receive (AAD group) or not receive (no-AAD group) AAD treatment for the first 6 weeks after ablation; all patients received atrioventricular nodal blockers. Physicians were encouraged to stop the AADs after the 6-week treatment period. All patients underwent 4 weeks of transtelephonic monitoring to document asymptomatic AF and an evaluation at 6 weeks and 6 months. A total of 110 patients (71% men) aged 55Ϯ9 years were randomized, with 53 to AAD and 57 to no AAD. At 6 months, there was no difference in freedom from AF between the early AAD and no-AAD groups (38/53 [72%]  versus 39/57 [68%]; Pϭ0.84). Lack of early AF recurrence during the initial 6-week period was the only independent predictor of 6-month freedom from AF (64/76 [84%] without early recurrence versus 13/34 [38%] with early recurrence; Pϭ0.0001). Conclusions-Although short-term use of AADs after AF ablation decreases early recurrence of atrial arrhythmias, early use of AADs does not prevent arrhythmia recurrence at 6 months. Early AF recurrence on or off AADs during the initial 6-week blanking period is a strong independent predictor of long-term AF recurrence. Clinical Trial Registration-URL: http://www.clinicaltrials.gov. Unique identifier: NCT00408200.</p>
                </div>
            </abstract>
            <abstract>
                <div
                    xmlns="http://www.tei-c.org/ns/1.0">
                    <p>Background We attempted to characterize age-related changes in blood pressure in both normotensive and untreated hypertensive subjects in a population-based cohort from the original Framingham Heart Study and to infer underlying hemodynamic mechanisms.</p>
                </div>
                <div
                    xmlns="http://www.tei-c.org/ns/1.0">
                    <head>Methods and Results</head>
                    <p>A total of 2036 participants were divided into four groups according to their systolic blood pressure (SBP) at biennial examination 10, 11, or 12. After excluding subjects receiving antihypertensive drug therapy, up to 30 years of data on normotensive and untreated hypertensive subjects from biennial examinations 2 through 16 were used. Regressions of blood pressure versus age within individual subjects produced slope and curvature estimates that were compared with the use of ANOVA among the four SBP groups. There was a linear rise in SBP from age 30 through 84 years and concurrent increases in diastolic blood pressure (DBP) and mean arterial pressure (MAP); after age 50 to 60 years, DBP declined, pulse pressure (PP) rose steeply, and MAP reached an asymptote. Neither the fall in DBP nor the rise in PP was influenced significantly by removal of subsequent deaths and subjects with nonfatal myocardial infarction or heart failure. Age-related linear increases in SBP, PP, and MAP, as well as the early rise and late fall in DBP, were greatest for subjects with the highest baseline SBP; this represents a divergent rather than parallel tracking pattern.</p>
                    <p>Conclusions The late fall in DBP after age 60 years, associated with a continual rise in SBP, cannot be explained by "burned out" diastolic hypertension or by "selective survivorship" but is consistent with increased large artery stiffness. Higher SBP, left untreated, may accelerate large artery stiffness and thus perpetuate a vicious cycle.</p>
                </div>
            </abstract>

So it looks good now, although the structuring of the abstracts in sections/paragraphs will need some improvements. I've planned to create a model specific for structuring the abstracts, but inbetween I am simply re-using the fulltext model on the abstract chunk - mainly to get correctly the bibliographical references in the abstract (with good associations with the bibliographical section).