Open gouttegd opened 3 hours ago
I believe the problem lies here:
inverted_df = df_to_invert.rename(
columns=_invert_column_names(list_of_subject_object_columns, columns_invert_map)
)
inverted_df = inverted_df[df.columns]
As I understand it, the last line in that fragment is intended to reorder the columns in the inverted_df
, so that they are in the same order as in the original df_to_invert
despite their renaming. That is, the renaming turned, for example, subject_id
into object_id
and the other way around, but the columns are still in their original positions, so the reordering is necessary to ensure the renamed columns are at their expected positions (e.g., the new subject_id
should be the first column).
But that reordering necessarily supposes that the inverted data frame will always contain the same columns as the original data frame. This is an unwarranted assumption. It won’t be the case if the set contains an subject_*
column that does not have an object_*
counterpart (which is perfectly valid in SSSOM, except for subject_id
and object_id
which must both always be present).
Suggested fix:
- inverted_df = inverted_df[df.columns]
+ inverted_df = sort_df_rows_columns(inverted_df, by_rows=False)
so that reordering is performed by the appropriate function.
Trying to
sssom invert
a mapping set that contains asubject_label
column but noobject_label
yields an error becausesssom-py
somehow expects to find asubject_label
column in the inverted set, even though the inverted set will (logically) only contain aobject_label
column.Example: given the following minimalist set:
the following command:
yields the following error:
More generally, it seems that the error is triggered by any
subject_*
column that does not have itsobject_*
counterpart in the set. For example, replacingsubject_label
bysubject_source
in the example above will yield exactly the same error trace, with aKeyError: "['subject_source'] not in index"
message.This issue affects the conversion to OWL as well (
sssom convert -O owl
), because that conversion involves at some point the inversion of the mapping set to convert.Issue originally found in https://github.com/monarch-initiative/omim/issues/114. Reproduced with the latest code from the master branch of
sssom-py
.