zazuko / barnard59

An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.
26 stars 2 forks source link

Functions for creating nicer URIs #29

Open ktk opened 4 years ago

ktk commented 4 years ago

Sometimes source data is messy and we don't get proper keys for all the entries we want to/have to process. This is especially true for data coming from Excel files which is not primarily designed for machine to machine interactions.

In my example I have a list of companies and I need to generate persistent URIs for it. Out of the box the only thing I can do is hope that URI encode generates an acceptable string, which is often not the case. In SPARQL I circumvent this problem by using hash-functions so the URI is a bit less ugly. Drawback of SPARQL is that the available hash-functions generate really long strings.

In JS I can use functions like crc32 to create more "Youtube" like short URIs out of it. But when the subject URI should contain such a shortened string, it can't be done directly at the moment as there is no support for functions in common mapping specifications like CSVW.

At least for cases like creating cubes, it would make sense to be able to apply functions to variables before using them.

ktk commented 4 years ago

This might not be necessary when we use modules like url-slug, which worked well in some pipelines already.