ioos / service-monitor

A web based catalog of IOOS services and datasets
http://catalog.ioos.us
6 stars 13 forks source link

Clarify and document the many to many relationship between data sets and data services #432

Open dpsnowden opened 9 years ago

dpsnowden commented 9 years ago

The relationship between data sets and services is unclear. It is clearly many to many, which is fine, but the way it is implemented from different configurations of THREDDS, ERDDAP, 52North etc is unclear and uneven. Let's clarify this once and for all and determine necessary changes for the next round of data base modifications.

![Many to Many](http://yuml.me/diagram/scruffy/class/%2F%2F Cool Class Diagram, [DataSet]0..--0..[DataService]) [Editable URL for the image](http://yuml.me/diagram/scruffy/class/edit/%2F%2F Cool Class Diagram, [DataSet]0..--0..[DataService])


Example 1: A gridded data set hosted on TDS

For example, one model hosted on a THREDDS server that has OPeNDAP, ncISO, and ncWMS enabled. The following image should result in a catalog count of DataSet +=1 and DataService +=3. Shouldn't it? I think the current behavior is DataSet+=1 and DataService+=2 because we don't count ncISO as a service. Is this desirable?

![One dataset, 3 data services](http://yuml.me/diagram/scruffy/class/%2F%2F Cool Class Diagram, , [<> MyModelRun]--[<> OPeNDAP], [<> MyModelRun]--[<> ncWMS], [<> MyModelRun]--[<> ncISO]) [Editable URL](http://yuml.me/diagram/scruffy/class/edit/%2F%2F Cool Class Diagram, , [<> MyModelRun]--[<> OPeNDAP], [<> MyModelRun]--[<> ncWMS], [<> MyModelRun]--[<> ncISO])

Example 2: i52N SOS with several ObservationOfferings

Another case is for an i52N SOS service that has say 4 Observation Offerings. Should be DataSet +=4, DataService+=1. I think this desired behavior is what is currently implemented.

i52N Editable URL for the image

Example 3: in situ data sets hosted on TDS with ncSOS enabled.

In this case we've paid special attention to ncSOS so I think it gets special treatment. Desired behavior is DataSet +=1 and DataService +=3. Current behavior (I think) is DataSet +=2 and DataService +=2. I think the reasoning is that a) ncISO is not counted so the services are only 2 and b) the OPeNDAP URL and the ncSOS URL are both counted as Data Sets (hence 2).

![ncSOS](http://yuml.me/diagram/scruffy/class/%2F%2F Cool Class Diagram, , [<> Station 1]--[<> OPeNDAP], [<> Station 1]--[<> ncSOS], [<> Station 1]--[<> ncISO])

[Editable URL](http://yuml.me/diagram/scruffy/class/edit/%2F%2F Cool Class Diagram, , [<> Station 1]--[<> OPeNDAP], [<> Station 1]--[<> ncSOS], [<> Station 1]--[<> ncISO])

Discussion? This confusion and the special treatment of ncSOS is the heart of the issue here.

dpsnowden commented 9 years ago

Below is a comment from @lukecampbell sent to me in email. I'm including it here to keep it with this discussion.
Currently...

Catalog 3.3 was released this morning.

I added the endpoints for statistics, I think the most useful to everyone will be these two: http://catalog.ioos.us/csv/metrics/datasets_by_ra?stop=1 http://catalog.ioos.us/csv/metrics/services_by_type?stop=1

I want to point out something about the counts, and their wide differences:

  • The front page dataset and services looks at unique datasets that are active
  • The "inventory" page looks at datasets that are active and have an active service directly associated with it
  • The reports count everything, including duplicates

A unique dataset is defined as having a unique "uid" which in the case of DAP services are the URL to which the data resolves. For SOS, the uid is the IOOS URN.