machinelearningZH / ogd_ai-analyzer

Automagically do a deep check of metadata quality of your DCAT OGD offering.
MIT License
5 stars 1 forks source link

Difficulty of adapting the analysis to other metadata standards, such as those of Austria or Germany? #1

Closed tifa365 closed 2 months ago

tifa365 commented 3 months ago

First of all, a big thank you for the 3 new AI projects and open sourcing the complete code. I also found it a very smart idea to use the Streamlit library for prototyping before diving too deep into more complicated frontends that are difficult to iterate with.

I am really no expert on this topic, but how difficult would it be to adapt the code to checking DCAT-AP.de or OGD metadata standards?

rnckp commented 3 months ago

Thanks for your positive feedback!

I imagine that adapting the analysis to the German DCAT-AP standard wouldn't be difficult. Many properties are the same or very similar.

Do you have a specific data catalog (e.g. Berlin, Munich etc.) in mind that you would like to try? Can you imagine collaborating on this?

tifa365 commented 3 months ago

I can definitely imagine collaborating and adapting this for Berlin. I am just not very familiar with the standard yet and would need time to get used to the format. Any material you could recommend getting up to speed?

tifa365 commented 2 months ago

I've forked the repo and started to adapt some of the 01_mdv_eda.ipynb and utils.py (DCAT_CLASS_DATASET, DCAT_CLASS_DISTRIBUTION) code to the DCAT-AP.de standard for Berlin. It was a bit complicated because I couldn't find a version of the standard in clean JSON but only jsonld. Also, unsurprisingly, there are some differences in the standard one has to figure out. I think this might be off to a good start, but there will be a lot of double-checking needed later.

https://github.com/tifa365/dcat_ap_de_ai-analyzer/blob/main/01_mdv_eda.ipynb https://github.com/tifa365/dcat_ap_de_ai-analyzer/blob/main/utils.py

rnckp commented 2 months ago

Great! Thanks for your effort. I'll have a look. 👍🚀

Just this morning I started to look into your suggestion and created a fork to collaborate on this. Haven't pushed it yet. Will first look into your code.

rnckp commented 2 months ago

In regard to sources (just for completeness sake) - I generally refer to the DCAT standard information like this for DE: https://www.dcat-ap.de/def/dcatde/2.0/spec/ (mainly «Klasse Datensatz» and «Klasse Distribution»)

To download the Metadata from the OGD portal of Berlin I found this CKAN documentation: https://berlinonline.github.io/open-data-handbuch/#schnittstellen

rnckp commented 2 months ago

If you prefer to continue this conversation via email or chat feel free to ping me on LinkedIn: https://www.linkedin.com/in/patrickarnecke/

It's also perfectly fine to continue here pseudonymously.

rnckp commented 2 months ago

Since we could resolve a couple of questions bilaterally I close this issue for now.