mapping-commons / sssom-py

Python toolkit for SSSOM mapping format
https://mapping-commons.github.io/sssom-py/index.html#
MIT License
48 stars 10 forks source link

Suggestion: `MappingSetDataFrame`: `.to_csv()` and `read_csv()` #517

Open joeflack4 opened 3 months ago

joeflack4 commented 3 months ago

Overview

I was initially suggesting that we could have an alternative constructor for MappingSetDataFrame that could accept paths rather than objects, however, based on some more recent discussions (1, 2), it might be better to do design very similar to pandas:

Reading

msdf: MappingSetDataFrame = read_csv()

Or read_tsv(), or just read().

It could accept various different combinations of params: a. Path to a SSSOM TSV which includes a metadata comment. b. Path to a SSSOM TSV with no metadata comment, and path to a metadata yaml. c. Path to a SSSOM TSV with no metadata comment, and path to a metadata object. d. a dataframe, and path to a metadata yaml. e. a dataframe, and path to a metadata object.

Writing

msdf.to_csv()

Or to_tsv().

This would be an alternative or replacement to parse_sssom_table().

matentzn commented 3 months ago

I see the point, but its a bit of a lower priority. I would probably recommend a utility method

def load_mapping_set_from_df(str: mapping_set_path, str: metadata_path):
        ...

Rather than adding further IO logic into the MappingSetDataFrame class.

joeflack4 commented 3 months ago

That would be fine, but I'm not sure if I like it either. Another thing that a user needs to know to import.

The logic for MappingSetDataFrame if it were a normal class would be a few lines, but I just realized that I don't know how to handle this for data class setup(s). And I guess with LinkML, it has to be a data class.

matentzn commented 3 months ago

I would say we go with the pandas design, as it is familiar to other devs, i.e. pd.load_csv(..)

joeflack4 commented 3 months ago

Yes, I like that solution!

joeflack4 commented 1 month ago

I just changed the title and OP of this issue from being about allowing MappingSetDataFrame to accept paths, to instead being about pandas-like functions/methods.