inbo / ipt-dcat

📃 Data Catalog Vocabulary (DCAT) functionality for the IPT
MIT License
4 stars 0 forks source link

Is a line break valid in DCAT? #58

Closed peterdesmet closed 8 years ago

peterdesmet commented 8 years ago

Hi @pietercolpaert, is a line break in <dct:description> valid in DCAT (see example below)? The DCAT validator does not complain, but it apparently blocks importing in the VODP.

<http://data.inbo.be/ipt/resource?r=invasive-chinese-mitten-crab-occurrences#Dataset>
a dcat:Dataset ;
dct:title "Chinese mitten crab (Eriocheir sinensis) occurrences in Flanders" ;
dct:description "The Research Institute for Nature and Forest has a long tradition in the monitoring of non-native species in Flanders (northern Belgium). Although the species is long established in the Flemish region, observational data collected in this dataset date back to early 2000. Data originates from several research projects and monitoring initiatives running at the institute. At current (September 2015) the dataset contains 6000+ occurrence records of mitten crabs. Due to the different origin of the records not all the fields in the dataset could be completed. This dataset anticipates the future data demands of the EU regulation No 1143/2014 on the prevention and management of the introduction and spread of invasive alien species.

Native to the Pacific coasts of China and Korea, mitten crabs arrived in European waters in the early 1900s. In Belgium, the species was first recorded in 1933 (Wouters 2002, Boets 2013). Chinese mitten crabs are believed to have been introduced via ship's ballast water and, possibly, intentionally released to establish fisheries (Marquard 1926; Peters et al. 1933; Gollasch 2006). Since its arrival in Germany, the Chinese mitten crab has rapidly invaded coastal and inland waters throughout Europe. The species was first observed in Belgium in 1933 in the Zeeschelde near Antwerp and is found nowadays in the main rivers and canals of the Scheldt basin. Since 2012 Chinese mitten crabs are also found in the canals near the Belgian coastline. The seasonal upstream migration of mitten crabs has become a well known phenomenon in Flanders (e.g. www.youtube.com/watch?v=CSBqwufl3pA). Despite this, the impact of mitten crabs is poorly studied in the region." ;
dcat:keyword "Occurrence" , "alien species" , "non-native species" , "invasive" , "GBIF" , "occurrence" , "Crustacea" , "Grapsoidea" ;
dcat:theme <http://eurovoc.europa.eu/5463> ;
pietercolpaert commented 8 years ago

It's not a DCAT issue, it's an issue with the serialization format: turtle.

Indeed, when a new line is given, the turtle specification says there are 2 ways to handle newlines: change them with a \n, or you can use triple double quotes. Two identical triples:

@prefix : <http://example.org/stuff/1.0/> .

:a :b "The first line\nThe second line\n  more" .

:a :b """The first line
The second line
  more""" .

Why the validator doesn't complain is because the library for turtle we use is probably forgiving, whilst the Flemish open data portal is more strict.

A quick fix is to implement one of the options above when handling a string value when serialising to turtle.

peterdesmet commented 8 years ago

Thanks @pietercolpaert!

peterdesmet commented 8 years ago

Issue will be tackled here https://github.com/gbif/ipt/issues/1231. Closing this one.