Using generalize-tsvs does not handle equal-named, different-mean columns correctly

matentzn commented 1 year ago

Environment:

linkml                        1.3.14
linkml-dataops                0.1.0
linkml-runtime                1.3.7
schema-automator              0.2.10
sssom-schema                  0.9.4

Using generalize-tsvs, and multiple table have the same column name for entirely different meanings, schema automator seems to run with one of them. In my case, this is not actually a case of "different meaning", but different ValueSet:

There is a table DOMAINS.csv which lists 10 domains in the domain_id column.
There is a table CONCEPTS.csv which lists 5 domains in the domain_id column.

If schema automator happens upon 2 first, the enum in the model has only 5 domains.

Probably, the enum should be augment during the generalisation process? what if there is an id or type column that means something entirely different from table to table?

matentzn commented 1 year ago

I see:

csv_data_generalizer.py:

# TODO: deal with cases where the same slot is used in different classes

cmungall commented 1 year ago

Sorry, missed this before

Yes, there should be two distinct enums, and different slot usages generated for each table

linkml / schema-automator

Using generalize-tsvs does not handle equal-named, different-mean columns correctly #108