nvkp / turtle

Golang package for parsing and serializing the Turtle format used for representing RDF data
MIT License
3 stars 0 forks source link

Incorrect parsing on floating-point values containing exponent #18

Closed jonnyschaefer closed 1 week ago

jonnyschaefer commented 1 week ago

Hello.

Thank you for the last fixes. I have the feeling, that this go package evolves to be one of the only sane go turtle parsers that works on real world turtle files.

While trying https://qudt.org/vocab/unit/ I noticed that it does not handle floats correctly: https://www.w3.org/TR/turtle/#abbrev, as e. g. 42E3, 1e0, -2.3E-12, +.3e+2 are also valid float representations.

Example:

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix qkdv: <http://qudt.org/vocab/dimensionvector/> .
@prefix quantitykind: <http://qudt.org/vocab/quantitykind/> .
@prefix qudt: <http://qudt.org/schema/qudt/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix si-unit: <https://si-digital-framework.org/SI/units/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix sou: <http://qudt.org/vocab/sou/> .
@prefix unit: <http://qudt.org/vocab/unit/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

unit:A
  a qudt:Unit ;
  dcterms:description """$\\textit{Ampere}$, often shortened to $\\text{amp}$, 
is the SI unit of electric current and is one of the seven SI base units defined as:

$$\\text{A} \\equiv \\frac{\\textit{C}}{\\textit{s}} 
\\equiv \\frac{\\textit{coulomb}}{\\textit{second}} 
\\equiv \\frac{\\text{joule}}{\\text{weber}}$$

Note that SI supports only the use of symbols and deprecates the use of any abbreviations for units.
"""^^qudt:LatexString ;
  qudt:applicableSystem sou:CGS-EMU ;
  qudt:applicableSystem sou:CGS-GAUSS ;
  qudt:applicableSystem sou:PLANCK ;
  qudt:applicableSystem sou:SI ;
  qudt:conversionMultiplier 1.0 ;
  qudt:conversionMultiplierSN 1.0E0 ;
  qudt:dbpediaMatch "http://dbpedia.org/resource/Ampere"^^xsd:anyURI ;
  qudt:hasDimensionVector qkdv:A0E1L0I0M0H0T0D0 ;
  qudt:hasQuantityKind quantitykind:CurrentLinkage ;
  qudt:hasQuantityKind quantitykind:DisplacementCurrent ;
  qudt:hasQuantityKind quantitykind:ElectricCurrent ;
  qudt:hasQuantityKind quantitykind:ElectricCurrentPhasor ;
  qudt:hasQuantityKind quantitykind:MagneticTension ;
  qudt:hasQuantityKind quantitykind:MagnetomotiveForce ;
  qudt:hasQuantityKind quantitykind:TotalCurrent ;
  qudt:iec61360Code "0112/2///62720#UAA101" ;
  qudt:iec61360Code "0112/2///62720#UAD717" ;
  qudt:informativeReference "http://en.wikipedia.org/wiki/Ampere?oldid=494026699"^^xsd:anyURI ;
  qudt:omUnit <http://www.ontology-of-units-of-measure.org/resource/om-2/ampere> ;
  qudt:siExactMatch si-unit:ampere ;
  qudt:symbol "A" ;
  qudt:ucumCode "A"^^qudt:UCUMcs ;
  qudt:udunitsCode "A" ;
  qudt:uneceCommonCode "AMP" ;
  rdfs:isDefinedBy <http://qudt.org/vocab/unit> ;
  rdfs:label "Ampere"@de ;
  rdfs:label "amper"@hu ;
  rdfs:label "amper"@pl ;
  rdfs:label "amper"@ro ;
  rdfs:label "amper"@sl ;
  rdfs:label "amper"@tr ;
  rdfs:label "ampere"@en ;
  rdfs:label "ampere"@it ;
  rdfs:label "ampere"@ms ;
  rdfs:label "ampere"@pt ;
  rdfs:label "amperio"@es ;
  rdfs:label "amperium"@la ;
  rdfs:label "ampère"@fr ;
  rdfs:label "ampér"@cs ;
  rdfs:label "αμπέρ"@el ;
  rdfs:label "ампер"@bg ;
  rdfs:label "ампер"@ru ;
  rdfs:label "אמפר"@he ;
  rdfs:label "آمپر"@fa ;
  rdfs:label "أمبير"@ar ;
  rdfs:label "एम्पीयर"@hi ;
  rdfs:label "アンペア"@ja ;
  rdfs:label "安培"@zh ;
  skos:altLabel "amp" ;
.

returns triples like

<http://qudt.org/vocab/unit/A> <http://qudt.org/schema/qudt/conversionMultiplierSN> <1.0>
<E0> <http://qudt.org/schema/qudt/dbpediaMatch> <http://dbpedia.org/resource/Ampere>
nvkp commented 1 week ago

I will fix this bug to take into account all possible formats of number. However, the TTL you provided seems to be invalid. There is a '.' sign right after the ';' sign.

unit:BTU_IT-FT
  a qudt:DerivedUnit ;
  a qudt:Unit ;
  dcterms:description "${\\bf BTU_{IT} \\, Foot}$ is an Imperial unit for $\\textit{Thermal Energy Length}$ expressed as $Btu-ft$."^^qudt:LatexString ;
  qudt:applicableSystem sou:IMPERIAL ;
  qudt:applicableSystem sou:USCS ;
  qudt:conversionMultiplier 321.581024 ;
  qudt:conversionMultiplierSN 3.21581024E2 ;
  qudt:definedUnitOfSystem sou:IMPERIAL ;
  qudt:definedUnitOfSystem sou:USCS ;
  qudt:expression "$Btu-ft$"^^qudt:LatexString ;
  qudt:hasDimensionVector qkdv:A0E0L3I0M1H0T-2D0 ;
  qudt:hasQuantityKind quantitykind:ThermalEnergyLength ;
  qudt:symbol "Btu{IT}·ft" ;
  qudt:ucumCode "[Btu_IT].[ft_i]"^^qudt:UCUMcs ;
  rdfs:isDefinedBy <http://qudt.org/vocab/unit> ;
  rdfs:label "BTU Foot"@en ;
.
nvkp commented 1 week ago

Anyway I made the parser work with this situation as well and the support for all the float format is implemented in the version v.1.1.4.