Open joeflack4 opened 3 months ago
sssom
and semapv
which are nearly always needed.MappingSetPrefixMap
called clean_prefix_map()
which removes all prefixes from the curie map that are not used. Try to see if it gets rid of some of the buildin ones?For (2), right--I forgot to do msdf.clean_prefix_map()
.
Do we really want the default behavior to be, that when no converter
is passed that we include all of these 1,547 entries? Not to mention the related sub-issues of #513: (2) Incorrect curie_map
(it leaves out prefixes that are in metadata
), and (3) UX: Should automatically instantiate Converter.
My preference would be that clean_prefix_map()
should be automatic, and if we want to have a parameter that adds tons of namespaces, we can add that.
Do we really want the default behavior to be, that when no converter is passed that we include all of these 1,547 entries?
i. I think you are right. Can you update the OC to request that, if the massive OBO context was used to infer prefixes, we should automatically call clean_prefix_map()
?
Incorrect
curie_map
(it leaves out prefixes that are inmetadata
),
ii. If this is true is a bug, add as action item to OC.
Should automatically instantiate Converter.
iii. This must already be the case..
(i) Done! (ii) Done!
(iii) It probably is, but let me rephrase: Should correctly instantiate Converter
.
This is related to #513 so let me add what I mean there (this comment).
Overview
When I write SSSOM to TSV, I'd like it to only include entries in
curie_map
where the prefixes are actually used in 1 or more places in the mapping set. However, extra entries are appearing.Case 1: When passing a
Converter
andmetadata
I expected 4 entries, but got 10.
I also mentioned this in: #513. I don't necessarily mind these extra entries in there, as they are some popular and relevant namespaces. The extra ones I got were:
owl
,sssom
, andoboInOwl
,rdf
,rdfs
, andsemapv
. I'd suggest that we could possibly add some parameterization for this. Stick with the default of either including these important namespaces or not, and then a parameter to allow for the opposite. Also, IDK if this is really asssom-py
issue or acuries
issue.Case 2: When not passing
metadata
, but noConverter
I expected 4 entries, but got 1,547.
Possible solutions
Nico wrote:
Additional details
FYI:
metadata
: icd11.sssom-metadata.yml.zipResults, based on various means:
Converter
and passing that toMappingSetDataFrame
. ordo-icd11.sssom - with converter.tsv.zipMappingSetDataFrame
passing in mymetadata
but noConverter
. ordo-icd11.sssom - no converter.tsv.zip