linkml / schema-automator

Automated assistance for the schema development lifecycle
https://linkml.io/schema-automator/
BSD 3-Clause "New" or "Revised" License
33 stars 12 forks source link

Using generalize-tsvs does not handle equal-named, different-mean columns correctly #108

Open matentzn opened 1 year ago

matentzn commented 1 year ago

Environment:

linkml                        1.3.14
linkml-dataops                0.1.0
linkml-runtime                1.3.7
schema-automator              0.2.10
sssom-schema                  0.9.4

Using generalize-tsvs, and multiple table have the same column name for entirely different meanings, schema automator seems to run with one of them. In my case, this is not actually a case of "different meaning", but different ValueSet:

  1. There is a table DOMAINS.csv which lists 10 domains in the domain_id column.
  2. There is a table CONCEPTS.csv which lists 5 domains in the domain_id column.

If schema automator happens upon 2 first, the enum in the model has only 5 domains.

Probably, the enum should be augment during the generalisation process? what if there is an id or type column that means something entirely different from table to table?

matentzn commented 1 year ago

I see:

csv_data_generalizer.py:

# TODO: deal with cases where the same slot is used in different classes
cmungall commented 1 year ago

Sorry, missed this before

Yes, there should be two distinct enums, and different slot usages generated for each table