Closed tifa365 closed 2 months ago
Thanks for your positive feedback!
I imagine that adapting the analysis to the German DCAT-AP standard wouldn't be difficult. Many properties are the same or very similar.
Do you have a specific data catalog (e.g. Berlin, Munich etc.) in mind that you would like to try? Can you imagine collaborating on this?
I can definitely imagine collaborating and adapting this for Berlin. I am just not very familiar with the standard yet and would need time to get used to the format. Any material you could recommend getting up to speed?
I've forked the repo and started to adapt some of the 01_mdv_eda.ipynb and utils.py (DCAT_CLASS_DATASET, DCAT_CLASS_DISTRIBUTION) code to the DCAT-AP.de standard for Berlin. It was a bit complicated because I couldn't find a version of the standard in clean JSON but only jsonld. Also, unsurprisingly, there are some differences in the standard one has to figure out. I think this might be off to a good start, but there will be a lot of double-checking needed later.
https://github.com/tifa365/dcat_ap_de_ai-analyzer/blob/main/01_mdv_eda.ipynb https://github.com/tifa365/dcat_ap_de_ai-analyzer/blob/main/utils.py
Great! Thanks for your effort. I'll have a look. 👍🚀
Just this morning I started to look into your suggestion and created a fork to collaborate on this. Haven't pushed it yet. Will first look into your code.
In regard to sources (just for completeness sake) - I generally refer to the DCAT standard information like this for DE: https://www.dcat-ap.de/def/dcatde/2.0/spec/ (mainly «Klasse Datensatz» and «Klasse Distribution»)
To download the Metadata from the OGD portal of Berlin I found this CKAN documentation: https://berlinonline.github.io/open-data-handbuch/#schnittstellen
If you prefer to continue this conversation via email or chat feel free to ping me on LinkedIn: https://www.linkedin.com/in/patrickarnecke/
It's also perfectly fine to continue here pseudonymously.
Since we could resolve a couple of questions bilaterally I close this issue for now.
First of all, a big thank you for the 3 new AI projects and open sourcing the complete code. I also found it a very smart idea to use the Streamlit library for prototyping before diving too deep into more complicated frontends that are difficult to iterate with.
I am really no expert on this topic, but how difficult would it be to adapt the code to checking DCAT-AP.de or OGD metadata standards?