opendatateam / udata

Customizable and skinnable social platform dedicated to open data.
http://udata.readthedocs.org
GNU Affero General Public License v3.0
238 stars 87 forks source link

Handle curly brackets (?) in DCAT harvester #2375

Open abulte opened 4 years ago

abulte commented 4 years ago

The DCAT harvester does not fully recognize .re as a valid TLD, The DCAT harvester issues some warning when checking URLs, cf https://sentry.data.gouv.fr/etalab/next-datagouvfr/issues/1866174/.

Forcing an upgrade of the tlds package might be enough https://github.com/opendatateam/udata/blob/master/requirements/install.pip#L57.

This comes from the DCAT harvester / rdflib serialisation. The harvester is here https://next.data.gouv.fr/fr/admin/harvester/5c05102f8b4c413fbdd08846.

It's only a warning and every dataset seems to be correctly harvested for this platform. Still, it's polluting the error logs so it would be a good idea to get rid of the warnings. Maybe an update of the rdf toolkit will help.

Apparently not linked to the .re TLD, but maybe to the {} in the query string, cf https://sentry.data.gouv.fr/etalab/next-datagouvfr/issues/1871221/ that breaks on http://opendata.chalons-agglo.fr/datasets/a71cfba1816c439d8b35ac568aeccd0b_13.kml?outSR={"latestWkid":2154,"wkid":102110}

abulte commented 4 years ago

For reference, it seems legit that we have a warning https://meta.stackexchange.com/a/79060

abulte commented 4 years ago

TODO: close this in a while if nobody had a brillant idea in the meantime