tdwg / camtrap-dp

Camera Trap Data Package (Camtrap DP)
https://camtrap-dp.tdwg.org
MIT License
45 stars 5 forks source link

Test whether it is possible for datasets to link to the profile only #186

Closed peterdesmet closed 2 years ago

peterdesmet commented 3 years ago

Camtrap DP datasets currently have to link to the 4 files of the standard separately. We should investigate if it's not possible to the profile only, which in return links to the table-schemas.

peterdesmet commented 2 years ago

Now that I have written an implementation (frictionless-r) I know it will be looking for a schema property for every resource in datapackage.json. There is no shortcutting that, I don't see how an implementation would look for a resource schema via profile.

What is possible in profile is to dictate what schema should be included for resources. We already do that by:

https://github.com/tdwg/camtrap-dp/blob/51ab483af944212bc734930e578032b0e8461c1c/camtrap-dp-profile.json#L57-L71

Which basically requires every schema to have a name property and that name should either be deployments, media or observations. The advantage here is that one can still link to a schema (with a name) or verbosely include a schema (with a name). The latter is good for testing. It doesn't test:

  1. That the right schema is used for the right resource, but that's not to bad, because validation will fail many other ways if you do so.
  2. That the same version of the schema is used for all schemas and the profile.

Now, that last part could be tested, by adding:

"schema": {
   "description": "URL of the used Camtrap DP Table Schema version (e.g. `https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0/deployments-table-schema.json`).",
  "enum": [
    "https://raw.githubusercontent.com/tdwg/camtrap-dp/0.1.6/deployments-table-schema.json",
    "https://raw.githubusercontent.com/tdwg/camtrap-dp/0.1.6/media-table-schema.json",
    "https://raw.githubusercontent.com/tdwg/camtrap-dp/0.1.6/observations-table-schema.json"
  ]
}

Advantage:

Disadvantages:

But the bottom line (as far as I see it) is that producers of Camtrap DP packages need to include 4 links.

niconoe commented 2 years ago

I quite like that approach because of the added tightness/robustness in datasets.

I am also wondering how exactly it affects our tests, but I'm pretty sure it'll add a layer of hacks on top of the existing layer of hacks :) I feel it's therefore some kind of orange flag, but I can't for now pinpoint exactly how we end up in a situation where some tests that should be simple are actually not. I'll revisit soon with hopefully a clearer mind about this.

peterdesmet commented 2 years ago

A more robust approach enforcing a single version number is suggested in #196.

Closing this issue, the basic answer to the issue title "Test whether it is possible for datasets to link to the profile only" is "No" (explanation in https://github.com/tdwg/camtrap-dp/issues/186#issuecomment-1021968305)