zazuko / cube-creator

A tool to create RDF cubes from CSV files
GNU Affero General Public License v3.0
11 stars 2 forks source link

Determine Unique Identifiers More Easily #1387

Open tboeni opened 1 year ago

tboeni commented 1 year ago

This feature request is rather a question of possibility: Recently I am dealing with file with a rather large number of differenct column names and row counts. This makes choosing the identifier quite a lot more difficult in my opinion. Cube Creator does give a warning message if the observation identifiers are not unique and this made me wonder.

Is there a way to maybe give a reference to the row/line number where the identifiers are not unique? This would make the process of fixing the issue a low easier but I am also not aware how Cube Creator is able to determine the non-uniqueness of the identifiers, so I might be completely on a wrong track here.

tpluscode commented 1 year ago

That is an interesting idea.

[...] how Cube Creator is able to determine the non-uniqueness of the identifiers

By definition, every observation must have exactly one value for each of its dimensions. When an identifier is not unique, some observations will end up merged together, and thus having multiple values for properites. Cube Creator runs a query over the transformed data

Unfortunately, at the moment there is no SPARQL UI for cube creator's database. It would have been easiest to link to that precise query.