zazuko / cube-creator

A tool to create RDF cubes from CSV files
GNU Affero General Public License v3.0
13 stars 2 forks source link

unexpectedly cubes with `schema:version` == 1 have `schema:datePublished` twice #1502

Closed Rdataflow closed 4 months ago

Rdataflow commented 5 months ago

bug upon publishing cube version 1 unexpectedly schema:datePublished occurs twice. https://s.zazuko.com/uaQE2x oddly there occurs a dateTime

fix new cubes with version 1 have only one schema:datePublished using date

note this issue only occurs with version 1

see also https://github.com/zazuko/cube-creator/blob/4a9a139effed1304dfa0991ccd9b1cca77ae624f/cli/lib/metadata.ts#L205

@CDiGallo as we just discussed. cc @ortnever (republishing works as an ugly workaround)

tpluscode commented 5 months ago

To sum up, you pointed to a line in code which adds a timestamp to schema:datePublished. Same value is set to schema:dateModified and dcterms:modified of current published, and schema:expires of previous versions.

The problematic scenario, related to the source code line referenced above, happens in cases when the publisher has provided a date in the cube metadata. Hence, two values.

First step to fix would be to only set schema:datePublished if there is no value in the meta.

Rdataflow commented 5 months ago

💯 NB: I wasn't aware there exists the possiblity to NOT provide the datePublished. nit: the curent shapes as well as generally used practice is to use a Date for datePublished - not a timestamp (aka. dateTime)

giacomociti commented 5 months ago

I understand we should use the date part of the timestamp variable as a default value for schema:datePublished. Only when revision = 1 or always?

Rdataflow commented 5 months ago

@giacomociti on version ==1 there occur two schema:datePublished (at least in case a manual date is given in cube-creator) - while only one of them is needed. - and true it must be of type xsd:date

that's what I understand until now. Tom may give you more info on the situation 👍

giacomociti commented 5 months ago

self-reminder: the datePublished logic was introduced in commit 99dd04e0280ee0f3c0813159d37d9b4d2349f891

CDiGallo commented 5 months ago

@tpluscode in this query https://s.zazuko.com/25fyGL1 one can see that the bug is not contained to version 1

CDiGallo commented 5 months ago

or it might be two bugs

giacomociti commented 5 months ago

likely it's a separate issue. A DESCRIBE of any of the cubes with version > 1 shows multiple values not only for datePublished but for most other properties

Rdataflow commented 5 months ago

@giacomociti for the separate issue: just guessing based on a recent odd experience... a cube creator project was unexpectedly reset to version 1 - even though there were already cubes online with exactly the same IRIs.

thus the question:

goal:

giacomociti commented 4 months ago

it's worth investigating if a cube can be republished with the same version, but I would first merge the changes so far

Rdataflow commented 4 months ago

Another very particular situation to check: what happens if publish is run twice at the same time?

giacomociti commented 4 months ago

Another very particular situation to check: what happens if publish is run twice at the same time?

that may likely be the cause of the unexpected multiple values for many properties :)

Rdataflow commented 4 months ago

unable to test currently as per transformation error https://test.cube-creator.lindas.admin.ch/app/cube-projects/cube-project!!ubd000503-anteile-der-treibhausgasemissionen-aus-treibstoffen-im-sektor-verkehr-ae3aa1720r8/materialize/jobs/cube-project!!ubd000503-anteile-der-treibhausgasemissionen-aus-treibstoffen-im-sektor-verkehr-ae3aa1720r8!!jobs!!JMcpEx8n4wR67npMG8Tu-