opendatateam / udata-ckan

CKAN integration for udata
2 stars 2 forks source link

Remote id usage for resources is problematic #217

Open abulte opened 2 years ago

abulte commented 2 years ago

We're currently using the remote resource id for our own resource id https://github.com/opendatateam/udata-ckan/blob/a7c5f4e67311152b066d697fc8899d5941b1f6d4/udata_ckan/harvesters.py#L250

This can be problematic if the id is not unique on the remote portal. It should not happen on the CKAN side, but it will break in at least the following case:

Possible solutions:

  1. stop relying on remote resource id altogether, instead use a new attribute resource.extras.harvest:remote_id to map the the remote resource to the local one https://github.com/opendatateam/udata-ckan/blob/a7c5f4e67311152b066d697fc8899d5941b1f6d4/udata_ckan/harvesters.py#L245 — the local resource will have an auto-generated resource id, which should be unique ➡️ this is nice but we need quite some code changes and a migration
  2. protect the harvesting process against conflictual IDs: raise an error for a given dataset if it contains an existing resource id ➡️ easier to implement but requires a manual action (dataset deletion) to fix the situation