Open ThrawnCA opened 11 years ago
By the way, we have manually patched messytables/types.py on our system to swap the guessing_weight of IntegerType and DecimalType, so the Latitude and Longitude now display correctly. However, the issue remains valid.
I'm not sure I understand the problem correctly. int('13.223')
will raise a ValueError
and thus the integer type will not be chosen. Does the problem still appear when you use strict=True
?
If you download the resource linked above, and upload it into another CKAN instance with a datastorer running (or link to it from another CKAN), the datastorer will interpret the Latitude and Longitude fields as type Integer, dropping all decimal places.
I believe that 'strict=True' is the default and is being used.
Hmm, I can't look into the details at the moment but by looking into the source code of the integer type, I would say that it should be rejected. We should try to find a minimum breaking example that only uses messytables but I don't have the time at the moment. @rossjones Could you look into this?
Fields with decimal places can still be parsed as integers, so both Decimal and Integer achieve perfect scores in type_guess. However, Integer has higher default weight, so the decimal places will be dropped.
This is a problem in data such as
https://staging.data.qld.gov.au/storage/f/2013-09-11T04%3A22%3A59.234Z/qscd-datafile.xls
where Latitude and Longitude will be rounded off (and thus become almost useless, because fractions of degrees are extremely important).
Should Decimal have higher default weight? Or, to keep Integer meaningful, should there be some way of distinguishing whether a field actually had decimal places or not?