wellcometrust / covid19

Covid QA system for kaggle challenge
0 stars 0 forks source link

Refine cutting text into 3 sentences #15

Closed lizgzil closed 4 years ago

lizgzil commented 4 years ago

It doesn't happen very often, but sometimes there are no fullstops in the text, so it doesn't split neatly into sentences. Also if there are fullstops in sentences such as "e.g." this doesnt work very well either.

Think of a better way at some point. @nsorros says " in the future we could experiment with scispacy that should be better in splitting sentences on that domain."

e.g.

"paper_id": "0775644af19e13cb5a4214be4db3f8ad75acd2d4",

snippet: " \u2022 A very recent study suggests that digestive symptoms such as loss of appetite and diarrhea are also common [8] \u2022 In contrast, upper airway catarrh syndrome, such as running nose or sneezing, are rare in COVID-19 patients [9] \u2022 According to the WHO, the three most common symptoms for COVID-19 are: fever, tiredness, and dry cough [10] \u2022 In addition to respiratory symptoms, some COVID-19 patients may exhibit GI symptoms [8] , suggesting unexplained diarrhea and may also be considered for screening \u2022 Recently, the American Academy of Otolaryngology -Head and Neck Surgery also recommended adding \"sudden loss of taste or smell\" as one of the symptoms for screening of COVID-19 [11] 4. How long does it usually take from infection to recovery? \u2022 The median time from symptom to recovery is 22 days in survivors and median duration of viral shedding is 20 days (longest 37 days) [12] \u2022 This information is useful for the strategic planning of cancer treatment during the COVID-19 outbreak: Delaying cancer treatment for COVD-19 positive cancer patients may be feasible for certain cancers, if expected delay is about 4 weeks \u2022 COVID-19 virus can spread via respiratory mucus or saliva droplets (coughing and talking), contact with bodily fluids (e.g"

lizgzil commented 4 years ago

( \u2022 is the code for a bullet point)

nsorros commented 4 years ago

I am happy to replace with scispacy, I run this example with it and the results are slightly better.