ixc / python-edtf

MIT License
52 stars 19 forks source link

Implement exponential year precision #12

Closed cogat closed 2 months ago

cogat commented 7 years ago

EDTFs of the form y17101e4p3 mean "Some year between 171000000 and 171999999, estimated to be 171010000 ('p3' indicates a precision of 3 significant digits.)"

At the moment, the p value is ignored, and lower_ and upper_ values are identical, being just the base times 10 to the exponent. The lower and upper bounds should vary by the indicated precision.

ColeDCrawford commented 5 months ago

@aweakley just wanted to check on this test: https://github.com/ixc/python-edtf/blob/613ccf5d95bb58ccee50327b0e8896d8789aa52e/edtf/parser/tests.py#L183-L185

The year should be 'Y17101E4S3' in EDTF, or 171010000. Should the bounds be '171000000-01-01' (just 3 digits) to '171999999-12-31', given that the precision is only 3 significant digits? The EDTF docs say:

Example 2 ‘Y171010000S3’ some year between 171000000 and 171999999 estimated to be 171010000

I think that this should be the same whether E is used to shorten the date or not?

aweakley commented 5 months ago

I agree, I think it should be the same with or without the E.

ColeDCrawford commented 5 months ago

Great. And those bounds also make sense?

aweakley commented 5 months ago

I think so, but I'm just wondering about this bit "..estimated to be 171010000"

ColeDCrawford commented 5 months ago

ExponentialYear.year should definitely return that. I'm just not sure how the "estimated" part should be expressed in the bounds. lower_fuzzy() should probably return 171000000-01-01, upper_fuzzy() should return 171999999-12-31. I guess the question is whether lower/upper_strict should take the full year 171010000 into consideration, or just the significant digits (171....).

I don't see any exponential, long or significant digit examples using date qualifiers at least ...

aweakley commented 5 months ago

I think this implies that lower/upper_strict should take account of the full year: https://en.wikipedia.org/wiki/Significant_figures

For instance, if a length measurement yields 114.8 mm, using a ruler with the smallest interval between marks at 1 mm, the first three digits (1, 1, and 4, representing 114 mm) are certain and constitute significant figures. Further, digits that are uncertain yet meaningful are also included in the significant figures. In this example, the last digit (8, contributing 0.8 mm) is likewise considered significant despite its uncertainty.[1] Therefore, this measurement contains four significant figures.

ColeDCrawford commented 5 months ago

That makes sense, but the S in EDTF directly specifies the number of digits to treat as significant, right? If we don't make use of that information, then there is no difference between the lower bound for 'Y17101E4S3', 'Y17101E4S4', or 'Y17101E4S5'.

Some of the other EDTF examples:

The definition for significant digits is: "A year (expressed in any of the three allowable forms: four-digit, 'Y' prefix, or exponential) may be followed by 'S', followed by a positive integer indicating the number of significant digits."

That means it's not just ExponentialYear that needs to support significant digits, but also LongYear and Date ...

aweakley commented 5 months ago

This is really clear to me: '1950S2', "some year between 1900 and 1999. I just don't know what we're supposed to do with estimated to be 1950

I was a bit surprised by the Wikipedia article really and I wonder how far we're supposed to go if we follow that logic? What about the year 123456789S1 - surely all those digits can't be assumed to be significant when the S part tells us they're not?

Reading the article's reference here: https://chem.libretexts.org/Bookshelves/General_Chemistry/Chem1_(Lower)/04%3A_The_Basics_of_Chemistry/4.06%3A_Significant_Figures_and_Rounding it gets more confusing, because they say something different to what the EDTF standard says:

So, what is a significant digit? According to the usual definition, it is all the numerals in a measured quantity (counting from the left) whose values are considered as known exactly, plus one more whose value could be one more or one less:

In “157900” (four significant digits), the left most three digits are known exactly, but the fourth digit, “9” could well be “8” if the “true value” is within the implied range of 157850 to 157950. In “158000” (three significant digits), the left most two digits are known exactly, while the third digit could be either “7” or “8” if the true value is within the implied range of 157500 to 158500.

What do you think about adding a new estimated() method to dates that have a significant-digits indicator? That way we could implement what the EDTF standard says.

ColeDCrawford commented 5 months ago

What would estimated() return for each of these examples? Just want to see how it would differ from the ExponentialYear.year property

aweakley commented 5 months ago

I think it would be the year but without the significance notation: 171010000, 1950 or 338800, so that would match the text description in the standard. So ExponentialIYear._precise_year()?

ColeDCrawford commented 5 months ago

I have some WIP on this that I'll post soon. Just to confirm so I finish updating the tests - this is what we're looking for?

>>> from edtf.parser.grammar import parse_edtf as parse
>>> normal_year = parse("1950S2")
>>> normal_year
Date: '1950S2'
>>> normal_year.estimated()
1950
>>> normal_year.lower_fuzzy()
time.struct_time(tm_year=1900, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=0, tm_isdst=-1)
>>> normal_year.upper_fuzzy()
time.struct_time(tm_year=1999, tm_mon=12, tm_mday=31, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=0, tm_isdst=-1)
>>> normal_year.lower_strict()
time.struct_time(tm_year=1950, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=0, tm_isdst=-1)
>>> normal_year.upper_strict()
time.struct_time(tm_year=1950, tm_mon=12, tm_mday=31, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=0, tm_isdst=-1)
>>> long_year = parse("Y171010000S3")
>>> long_year
LongYear: 'Y171010000S3'
>>> long_year.estimated()
171010000
>>> long_year.lower_strict()
time.struct_time(tm_year=171010000, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=0, tm_isdst=-1)
>>> long_year.upper_strict()
time.struct_time(tm_year=171010000, tm_mon=12, tm_mday=31, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=0, tm_isdst=-1)
>>> long_year.upper_fuzzy()[:3]
(171999999, 12, 31)
>>> long_year.lower_fuzzy()[:3]
(171000000, 1, 1)
>>> exp_year = parse("Y3388E2S3")
>>> exp_year
ExponentialYear: 'Y3388E2S3S3'
>>> exp_year.estimated()
338800
>>> exp_year.upper_strict()
time.struct_time(tm_year=338800, tm_mon=12, tm_mday=31, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=0, tm_isdst=-1)
>>> exp_year.lower_strict()[:3]
(338800, 1, 1)
>>> exp_year.upper_fuzzy()[:3]
(338999, 12, 31)
>>> exp_year.lower_fuzzy()[:3]
(338000, 1, 1)
aweakley commented 2 months ago

This is resolved by #56