lfoppiano / grobid-quantities

GROBID extension for identifying and normalizing physical quantities.
https://grobid-quantities.readthedocs.io
Apache License 2.0
74 stars 24 forks source link

One atomic quantity expressed with two different values #58

Closed everzeni closed 6 years ago

everzeni commented 6 years ago

In the sentence:

an 81-year-old male athlete was able to finish a 100-km ultra-marathon in a time of 19 h 44 min

What to do with 19 h 44 min?

It's definitely one atomic quantity, but expressed with different units.

Patrice suggested we should maybe introduce a new type of list with the meaning "atomic quantity expressed a composition of values"

everzeni commented 6 years ago

related:

The current Hawaii Ironman triathlon record is 8:54:202 for females and 8:04:08 for males

kermitt2 commented 6 years ago

I would say the case 19 h 44 min corresponds to "Unit embedded in numerical value" in the annotation guidelines - case not supported in the annotation, see #49 At some point we'll need to find a solution :D

However 8:54:202 is a standard time value, it 's not the same case I think, so it would be:

The current Hawaii Ironman triathlon record is <measure type="value"><time when="8:54:202">8:54:202</time></measure> for females and <measure type="value"><time when="8:04:08">8:04:08</time></measure> for males

see the example on the TEI P5 documentation page for <time>.

everzeni commented 6 years ago

I checked the tei page, but I don't see how 8:54:202 is a standard time value, unless you see it at "8 hours 54 mn 202 secs after the date/time of departure of the Ironman the day the record was broken". In that case a lot of Time measures we annotated are in fact standard time values.

kermitt2 commented 6 years ago

In #48 we said that <time> is used to expressed amount of time not liked to a date and <date> for "referential" time (a date).

everzeni commented 6 years ago

The example was "relax between 20:00 and 22:00", we meant not linked to a precise or known date, but it's still anchored in a day. The record for the Ironman competition is different I think, it's a duration, and we annotated plenty of them as numbers.

kermitt2 commented 6 years ago

ok I see then we are indeed in the same of one atomic value expressed with different values and units :/