kg-construct / rml-lv

Specification repository for logical views in RML.
https://kg-construct.github.io/rml-lv/dev.html
3 stars 3 forks source link

Adding functional dependencies to structural annotations #29

Open bcogrel opened 3 months ago

bcogrel commented 3 months ago

Functional dependencies are a generalization of unique constraints.

They are very much needed for virtualization when dealing with denormalized data, which is almost the de facto norm when data comes from files (e.g. JSON files, CSV extracts, Excel).

To give an example, let's consider a table describing persons and their address. It has an unique constraint over the person_id. With functional dependencies, we can further declare that the city_id determines the city_name and the region_id, that the region_id determines the region_name and the country_id and so on and so forth. These dependencies are not unique because they are repeated in many rows.

Functional dependencies are also typically transitive (e.g. city_id determines in the end the country_id). It is important to let the processor compute the transitive closure, as specifying it manually can be very cumbersome and error-prone.

This feature is supported in Ontop lenses: https://ontop-vkg.org/guide/advanced/lenses.html#otherfunctionaldependency .

At Ontopic, we use them a lot in projects. In particular, they are key for dealing with JSON-like data structures in large datasets available in platforms like BigQuery or SparkSQL. Without them, virtualization wouldn't have been feasible over these large datasets.

Functional dependencies enable some self-inner-join and self-left-join optimizations.