w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
144 stars 46 forks source link

How to model dct:temporal for continously evolving Datasets? #1403

Open init-dcat-ap-de opened 2 years ago

init-dcat-ap-de commented 2 years ago

In GovDataOfficial/DCAT-AP.de#17 we discussed a real usecase where I am surprised to find no obvious answer. I escalated the issue to https://github.com/SEMICeu/DCAT-AP/issues/201 where we concluded, that we don't have a (structured) solution.

Use Case There is a Dataset which is updated constantly (dcterms:accrualPeriodicity) with a resolution of one hour (dcat:temporalResolution). But you can only get the data of the last 10 days. (Something that's probably pretty common for sensor data. A sliding window is what we need to be able to describe.

How would you model this? Neither xsd:date nor dcterms:PeriodOfTime allows this. We could maybe get it to work with xsd:duration:

_:ds  a dcat:Dataset ;
  dcterms:accrualPeriodicity <http://publications.europa.eu/resource/authority/frequency/UPDATE_CONT> ;
  dcat:temporalResolution "PT1H"^^xsd:duration ;
  dcterms:temporal "P10D"^^xsd:duration .

But that would not be allowed. And it would only be implicit, that you get the last 10 days.

Does anyone have an idea?

dr-shorthair commented 2 years ago

Correct. The range of dcterms:temporal is expected to be an interval, not a duration.

Maybe https://www.w3.org/TR/owl-time/#time:hasDuration ?

init-dcat-ap-de commented 2 years ago

So something like this?

_:ds a dcat:Dataset ;
  time:hasDurationDescription [
    a time:DurationDescription ;
    time:days "10"^^xsd:decimal ;
  ]

Any plans to make it "official"? Does anyone else has this usecase?

andrea-perego commented 2 years ago

@init-dcat-ap-de , it should rather be

_:ds a dcat:Dataset ;
  dcterms:temporal [ a dcterms:PeriodOfTime ;
    time:hasDurationDescription [
      a time:DurationDescription ;
      time:days "10"^^xsd:decimal ;
    ]
  ]
.

or

_:ds a dcat:Dataset ;
  dcterms:temporal [ a dcterms:PeriodOfTime ;
    time:hasDuration [
      a time:Duration ;
      time:numericDuration "10"^^xsd:decimal ;
      time:unitType time:unitDay 
    ]
  ]
.

But I understand you need also a way to specify that the time interval ends "today".

@dr-shorthair , any suggestion on this?

dr-shorthair commented 2 years ago

OWL-Time only provides a few individuals: for some durations (the standard second, minute, hour etc) and days-of-the-week. It might be useful to have a specific individual temporal-entity for 'now', which could then appear as the end or beginning of a time-interval with a specified duration, e.g. the last week

ex:PrecedingWeek
  rdf:type owlTime:ProperInterval ;
  rdfs:label "Interval of specified duration ending now" ;
  owlTime:hasEnd owlTime:Now ;
  owlTime:hasTemporalDuration owlTime:unitWeek ;
.

or the last ten days

ex:LastTenDays
  rdf:type owlTime:ProperInterval ;
  rdfs:label "Interval of specified duration ending now" ;
  owlTime:hasBeginning owlTime:Now ;
  owlTime:hasDuration [
      rdf:type owlTime:Duration ;
      owlTime:numericDuration 10. ;
      owlTime:unitType owlTime:unitDay ;
    ] ;
.

where

owlTime:Now
  rdf:type owlTime:Instant ;
  rdfs:label "Non-specific temporal entity denoting 'now'" ;
.

Maybe raise an issue in https://github.com/w3c/sdw/issues

init-dcat-ap-de commented 2 years ago

@dr-shorthair & @andrea-perego Yes, a time:Now would be interesting to indicate the endDate of the temporal resolution. But maybe not neccessary. Because we also have dcterms:accrualPeriodicity which might be enough to say what we want:

_:ds a dcat:Dataset ;
  dcterms:accrualPeriodicity <http://publications.europa.eu/resource/authority/frequency/UPDATE_CONT>  ;
  dcterms:temporal [ 
    a dcterms:PeriodOfTime ;
    time:hasDuration [
      a time:Duration ;
      time:numericDuration "10"^^xsd:decimal ;
      time:unitType time:unitDay 
    ]
  ] 
.

If the dcterms:accrualPeriodicity is not CONTINOUSLY, e.g. MONTHLY, a time:Now as end date would be not correct.

init-dcat-ap-de commented 2 years ago

To make the use of the time:Duration more clear, the creation of a new property would be good (a quick draft):

dcat:timeWindow
Range time:Duration
Domain dcterms:PeriodOfTime
Definition If the period of time is a somehow sliding window, it can not be defined by start date and end date.
This property allows to define it with a time:Duration
Usage-Note The use of this property is only necessary, if the start date of the data is shifting. Therefor it should be used together with dcterms:accrualPeriodicity.
agreiner commented 2 years ago

If the dataset is that dynamic, shouldn't it be possible to update the timestamps in the metadata whenever it is accessed? It's hard to imagine a desire to use start and end times unless they contain actual times. I'm very uncomfortable with the notion of assigning "now" to anything, as the meaning becomes very unclear to the consumer.

init-dcat-ap-de commented 2 years ago

We would still need information in the metadata to indicate, for which datasets the timestamps have to be updated and how they have to be updated.

jze commented 2 years ago

If the dataset is that dynamic, shouldn't it be possible to update the timestamps in the metadata whenever it is accessed? It's hard to imagine a desire to use start and end times unless they contain actual times. I'm very uncomfortable with the notion of assigning "now" to anything, as the meaning becomes very unclear to the consumer.

The problem occurs when metadata is forwarded to other portals. In Germany the federal states operate their own open data portals. Once a day or one a week the metadata is harvested by the national open data. This in turn is harvested by the European portal. So it takes a few days for the data to arrive at the European portal. In some cases also municipal portals are involved, adding to the delay.

In that case it is important not to have to specify fixed times. Without "floating" time data, we often (or nearly always) would have incorrect temporal metadata. For example, the end date in the European data portal would be already a week old, while the federal has already updated its end date.

GKStGovData commented 2 years ago

some of our data suppliers have the need to define the time reference with a fixed start date and a floating end date (which does not have to be today). These needs should be considered in the solution.

tomkralidis commented 2 years ago

Interesting discussion (having seen this as part of wmo-im/wcmp2#11). We have use cases in weather for rolling window retention. Would one need to consider this use case in the context of updating dct:temporal, or is it really a function of the data access mechanism? Or both/something else?

An example use case is an organization that has been producing hourly observations since 2009-07-11, with a rolling window of 90 days made available from some data access endpoint. From the discovery perspective, I would still see the temporal extent as 2009-07-11/... It's the data access mechanism that provides the last 90 days (data beyond 90 days could be archived or made available through some other arrangement).

Thoughts?

Haigutus commented 2 years ago

Hi, not so familiar with semantic web, but we are trying to provide metadata for schedules and models covering different periods of time, but are sent with specific periodicity. Any recommendation how to do this with dcat.

There is two needs:

  1. Define process/publication event reoccurrence - here we would like to use cron syntax - https://crontab.guru/
  2. Define process/publication temporal coverage in a dynamic way

Example of dynamic period definition

NB! Time Zone context must be added in business process itself

Name | Description | Reference Time | Period Start | Period Duration -- | -- | -- | -- | -- ID | Process running continuously covering given day | currentDayStart | P0D | P1D H-8 | Process intraday covering 8 hours ahead | currentHourStart | PT1H | PT8H D-1 | Process that runs day before the targeted day | currentDayStart | P1D | P1D D-2 | Process that runs two days before the targeted day | currentDayStart | P2D | P1D D-7 | Process that runs day before the targeted window of 7 days | currentDayStart | P1D | P7D W-0 | Process that runs in current week and coveringcurrent week | currentWeekStart | P0W | P1W W-1 | Process that runs in current week and covering next week | currentWeekStart | P1W | P1W M-1 | Process that runs in current month and covering next month | currentMonthStart | P1M | P1M Y-1 | Process that runs in current year and covering next year | currentYearStart | P1Y | P1Y
davebrowning commented 1 year ago

Project/Milestone modified.

Explanation: As DCAT v3 moves through review and hopefully ratification, we want to make sure that open issues and feedback that have yet to be completely addressed are properly recorded and tagged/assigned in github to both clarify their status and to help review and prioritise as a source of improvements and new requirements in future DCAT versions