zazuko / cube-creator

A tool to create RDF cubes from CSV files
GNU Affero General Public License v3.0
12 stars 2 forks source link

Zeros at the end of a number are not recorded in LINDAS #1336

Open ortnever opened 1 year ago

ortnever commented 1 year ago

@l00mi : c'est assez "urgent" car nous aimerions aussi vérifier si ça apparaît correctement dans Visualize (issue full precision), une fois dans LINDAS.

Zeros at the end of a number are not recorded in LINDAS, whereas they are imported correctly in the cube creator and appear in the Cube Designer.

It is important that these zeros at the end of a number are saved, because with the number of decimal places we also make a statement about the "uncertainty".

Is this something that could be corrected quickly?

For the tests, you can use the following cube: https://environment.ld.admin.ch/foen/ubd000502_sad_01/7 on Lindas INT.

image

image

l00mi commented 1 year ago

@ortnever it is not possible to keep the 0 in LINDAS, that is a technical limitation of how the data with numeric datatype is saved.

I am not sure why in cube-creator the 1.380 is actually shown, @tpluscode might it be that it is still handled as string at this point, and only converted to xsd:decimal once uploaded to LINDAS?

So I think the fix of this would need to be done with the idea we already had to add a formating information per dimension. Unfortunately not something quick.

tpluscode commented 1 year ago

Literals are stored at face value when produced in transformation.

For example, for observation /1/CO2/2020, in the Cube Creator's database you would find "34.350"^^xsd:decimal When the cube get published, in Lindas that becomes "34.35"^^xsd:decimal.

I do not understand when exactly this happens. I don't think it's the publishing pipeline itself, given that in local environment (Fuseki), the trailing zero is preserved. I do not see any Stardog configuration difference between Cube Creator and Lindas databases which would explain that.

Bottom line, though, trailing zeros, as well as leading zeros, to not change the value of a of a literal and can thus be dropped. I agree that a format annotation to use by Visualize would be the most appropriate course of action

tpluscode commented 1 year ago

a format annotation to use by Visualize would be the most appropriate course of action

ping @ptbrowne What do you think about a display format annotation on dimension metadata?

ptbrowne commented 1 year ago

I think display can change according to the need of the users. I think what should be in the data should be the resolution at which the data has been recorded, and then the UI has the choice of the formatting.

Right now in visualize, we use the concept of resolution to format integers without a comma. See code here. This resolution variable could be taken dynamically from the dimension and passed to the front-end so that it can make appropriate choices.

I think it'd be important to choose the right vocabulary here : https://www.crossco.com/resources/articles/metrology-terminology-accuracy-precision-resolution/, I would have thought that qudt would provide us with this kind of vocabulary, I took a quick look but could not find anything.

Also related is how currency are formatted : Docs, Code

l00mi commented 1 year ago

compare to https://gitlab.ldbar.ch/bafu/umweltdatenkiosk-planning/-/issues/332#note_9360

ortnever commented 1 year ago

Merci @l00mi ! je cherchais justement l'issue ;-)

l00mi commented 1 year ago

So I had a talk to a machine:

how do your refer to the display of the amount of zeros after the comma of a number ?

The number of zeros after the comma (or period) of a number is commonly referred to as the "number of decimal places". For example, in the number 3.14159, there are five decimal places. Alternatively, the number of decimal places can also be referred to as the "precision" or "significant figures" of the number. These terms refer to the level of detail or accuracy that the number represents. For example, a number with three decimal places (e.g. 3.141) implies a higher level of precision than a number with only one decimal place (e.g. 3.1). It's important to note that the number of decimal places, precision, and significant figures all refer to the same concept, but they may be used in different contexts or with different conventions.

Is it the same a qudt:precision ?

No, QUDT (Quantities, Units, Dimensions, and Data Types Ontology) is a standardized ontology for describing physical quantities, units of measurement, and their relationships. The QUDT specification includes a property called "qudt:precision", which is used to represent the number of significant figures or decimal places in a measured value. QUDT's "qudt:precision" property is similar in meaning to the concept of precision discussed earlier, but it is a specific property defined within the QUDT ontology, which is designed to provide a consistent way to represent precision in measurements within a broader semantic framework. In summary, while both "precision" and "qudt:precision" relate to the level of detail or accuracy in a measurement, "qudt:precision" is a specific property defined within the QUDT ontology to represent this information in a standardized way.

That would be an example of its use:

ex:LengthMeasurement rdf:type qudt:QuantityValue ;
    qudt:numericValue "5.372"^^xsd:decimal ;
    qudt:unit ex:Meter ;
    qudt:precision "3"^^xsd:nonNegativeInteger .

While the machine was very vague, I guess it is save to use qudt:precision for this use case. Where the number is the amount of digits after the coma.

l00mi commented 1 year ago

@ptbrowne so its now up to you to say that this will be enough, or shall we go further and try something more power full in the regards of good ol 'sprintf()' .. e.g. https://locutus.io/php/strings/sprintf/

ortnever commented 1 year ago

@ptbrowne : I add an example of a use case below, because I don't understand the whole discussion and I would like to be sure that it meets our needs.

Columns A to E are the delivered data. We would like that Visualize displays the numbers of column D rounded to the precision indicated in column E. Column G is of course not part of the source data and is just there to show what we would like Visualize to display.

@Thomas: does this use case address other needs and is it general enough?

image

ptbrowne commented 1 year ago

Thanks for providing an example Veronique. From my end what I see here is

ortnever commented 1 year ago

exactly. But it's only one use case. Maybe there are other ones in relation with this issue.

Rdataflow commented 1 year ago

I think there are two generic cases as of now:

l00mi commented 1 year ago

So the precision per value can be more quickly solved, as we have the structure as for the standard error.

Best to create an issue for each of the both use cases. And then priorize them individually.