okfn / opendataeditor

No-code application to explore and publish all kinds of data: datasets, tables, charts, maps, stories, and more. Forever free and open source project powered by open standards and generative AI.
http://opendataeditor.okfn.org
MIT License
149 stars 18 forks source link

Publish to CKAN Feature [Research/Discussion] #438

Open pdelboca opened 1 week ago

pdelboca commented 1 week ago

Publishing to CKAN: Main Challenge

Publication to CKAN can be tricky since it is directly related with the schema that each CKAN instance implements for their datasets and resources. As an example, here is the schema that opendata.swiss implements: https://ckan.opendata.swiss/api/3/action/scheming_dataset_schema_show?type=dataset.

In order to be able to publish, we need to known what CKAN expects. If CKAN does not provide that information, we cannot publish, so requirement number 1: we should be able to access to the particular CKAN instance schema or assume it is a vanilla implementation.

Since the goal of Open Data Editor is to be a tool for non-technical users, to properly implement this feature we need:

We need better definitions

We need a proper definition of what does it mean to "Publish to CKAN":

We will always have the schema issue, but going for more simpler scopes (like just replacing a File in an already created CKAN dataset.)

Current Implementation gaps

The current implementation is based on https://github.com/frictionlessdata/frictionless-ckan-mapper which provides a set of hard coded fields to map between a vanilla CKAN instance and Frictionless. The current Frictionless Portals Documentation does not provide any mention of how to handle custom schemas so I'm assuming that it is not implemented (Maybe @roll can provide some context here?)

Even if it is implemented at the core of Frictionless, we will need to still work on the UI that will power the feature. (Or work on a UI that works for technical users only)

Exposing the schema

The most widely used extension in CKAN to customize the schema is ckanext-scheming, however not all instances expose the endpoint to show the schema like Open Data Swiss does. Without the information of what CKAN expects it is not possible to define a UI form. We might be able to play around only with fields that we know for sure CKAN expects, mostly if our goal is a feature to just update a CKAN resource.

Dynamic Form

Building a dynamic form even when it is completely feasible, it is not an easy task. The good thing is that the fields that ckanext-scheming exposes are limited in number so we have limited implementations. There are some tools like https://uniforms.tools/docs/what-are-uniforms/ that creates forms from schemas (even using MUI!), but I'm not sure how flexible they are to create forms on-the-fly based on what ckanext-scheming returns.

It is worth to point that ckanext-scheming does not return the type of the data, but rather what snippet should be used to render the form. So it will not provide information of how it is stored or handled.

ckanext-scheming provides a list of validators but I assume it will be easier to not double-implement front end validation.

romicolman commented 1 week ago

Hey @pdelboca. If I understood correctly, publication in CKAN through the ODE seems more complicated than expected because the process is connected to the specific CKAN instance. Also, the best solution would be to work on the publication feature in stages, right? Can we start by publishing a file as a new resource when the dataset already exists?