Open joeflack4 opened 2 months ago
I would also like to have this option. Otherwise nearly every SSSOM TSV I create is more prefix map than anything else, by lines.
@caufieldjh you must have small mapping sets then - there are only about 6 built in prefixes.. I do agree though with this issue, and happy to incorporate a parameter like this.
I was observing the same effect @joeflack4 did - when instantiating a MappingSetDataFrame
without a Converter
, the included prefix map included all ~1.5K prefixes and the object takes some time to create.
If I do it more like this, instantiation is instantaneous:
metadata = get_default_metadata()
metadata['curie_map'] = {}
converter = Converter.from_prefix_map(metadata['curie_map'])
msdf = MappingSetDataFrame(converter=converter, df=df, metadata=metadata)
@caufieldjh this is a slightly different ask: you are saying that when you instantiate with just a converter, you would like by default to run "clean prefix map".
I think I can see that could be a good idea - but its not the same as the parameter being asked for here.
I still see this as related - the call to clean_prefix_map
doesn't have to be a default, as there are certainly cases in which it's useful to start with a prefix map including builtins, add/merge new mappings into the msdf, then do the final cleanup step to remove unused prefixes (working under the assumption that some of those prefixes are now present in the mappings, too).
I guess it is slightly different but related. I think that actually what @caufieldjh is referring to is actually just a bug, because there are 1,547 prefixes included in the output when that happens.
In any case, just linking this related stuff here:
Overview
Given that built-in prefix maps are not required unless they appear somewhere in the mapping set, it would be nice to have a way to give users the ability to create an
msdf
and an SSSOM TSV that leaves these out.