w3c / csvw

Documents produced by the CSV on the Web Working Group
Other
161 stars 57 forks source link

Integration tests - test148 clarification #895

Closed drexem closed 3 months ago

drexem commented 5 months ago

Hi,

Would you please explain to me why is the integration test from the test suite number 148 marked as negative validation test? What causes the error here? Are the embedded metadata not compatible with the JSON-LD descriptor? Which exact column causes the errror?

Also in the test 112 . In the specification it is said that : If the supplied value is an array, any items in that array that are not strings must be ignored.. By this logic we should just ignore the 1 in titles property "titles": ["GID", 1] a proceed with just the "GID". I see the same problem with the test 110 which contradicts this statement: If the supplied value is an object, any properties that are not valid language codes as defined by [[BCP47](https://www.w3.org/TR/2015/REC-tabular-metadata-20151217/#bib-BCP47)] must be ignored, as must any properties whose value is not a string or an array, and any items that are not strings within array values of these properties.

Thanks for help in advance.

gkellogg commented 5 months ago

Validation test 148 fails because the metadata sets the default language to "de", and the test for the title in column 2 is explicitly defined for "en", so even though the value is "On Street", it is in the wrong language.

Validation test 112 will pass, but should issue a warning because of the invalid property. My implementation says "natural_language has invalid property 'titles' (["GID", 1]): expected a valid natural language property". Test 110 also passes with the same warning.

drexem commented 5 months ago

Thanks @gkellogg ,

but I still do not understand why should the test 148 fail. The embedded metadata will have compatible table schema because the embedded metadata will have every title in the language "und" and it will match the "en" version On Street. So what is the problem with the default language being set to "de"? Should every titles property on column description have a title in the default language? Other than the schema compatibility check, we do not use the titles property in any other checks or am I wrong?

Thanks!

gkellogg commented 5 months ago

The language for each column is extracted from the metadata description using the process described in Tabular Data Model §6.1 Creating Annotated Tables. In particular, step 3.6 of the part specific to when starting with the metadata file says to use the metadata established in TM to add annotations to T, which includes the lang inherited property. This sets the language for each column to de. That is why it fails to match.

I realize that this may be unintuitive, and the column description could be thought to override that language assignment, but this is not how the processing actually works. Also, the spec is now eight years old, and there's not much prospect of going back to address things, and there is an issue list of change requests. Please consider creating such an issue with rational and use case so that a future group may re-consider this.

Personally, I think there are a number of ways in which the strict interpretation of the metadata with the data is inflexible, but the specification evolved at a time when there were a number of concerns about data integrity and specs such as JSON-LD (which is the basis of the metadata format) were not mature. A newer system I think would better leverage the flexibility of JSON-LD to enhance/interpret data in CSV and other tables.

drexem commented 5 months ago

@gkellogg So if I understand it correctly, the lang inherited property overrides the languages in natural language property titles associated with this column? But if that is the case then still the titles extracted from the tabular data will have all the titles languages set to und and the titles extracted from metadata file will have all titles with language de. And because the und matches any language, there should be non empty intersection between these two titles properties.

Is this even issue with the embedded metadata or is this issue just in the metadata file, and so the metadata file on its own produces this error?

Sorry for my maybe trivial questions, but I am a little bit lost here. This is the only test my implementation is not passing and I cannot wrap my head around why, and what validation rule should I even add based on the specification.

Thanks for your help!