kevinrue / unisets

Collection of classes to store gene sets.
http://kevinrue.github.io/unisets
Other
3 stars 1 forks source link

Add import.gmt #3

Closed robertamezquita closed 5 years ago

robertamezquita commented 5 years ago

See: https://github.com/Kayla-Morrell/GeneSet/blob/tibble_implement/R/import-class.R#L32

robertamezquita commented 5 years ago

Suggestion by @kevinrue : grab a public small GMT file and drop it into inst/extdata, write unit tests and examples based on it. access it via gmtFile <- system.file(package = "unisets", "inst", "extdata", "example.gmt")

robertamezquita commented 5 years ago

Comment by @kevinrue on 06e1128f3030cc073f8d25158f074de5f27050c3 :

robertamezquita commented 5 years ago

re: bullet 1 @kevinrue this should probably be in the BaseSets constructor?

kevinrue commented 5 years ago

Actually, as something that crossed my mind over the weekend, I realised that for things like the Gene Ontology, there is the possibility of describing the same gene:set mapping with different "evidence codes" (http://geneontology.org/page/guide-go-evidence-codes). Currently, packages like org.Hs.eg.db store mappings in database format, i.e. as multiple relationships between the same gene and the same gene set, each with a distinct evidence code. It's obviously open for discussion, but following that logic in "long table format", I'd argue that the BaseSets class should not enforce a "unique" validity check on the gene:set relations, and that we leave it up to the user to clean the BaseSets object after import (easier than cleaning the GMT file before import). Following the same logic, I would say that the answer to #5 should be "yes, allow duplicate elements".

robertamezquita commented 5 years ago

In that case, I'll leave the unique aspect alone for now, maybe issue a message, and leave it to downstream methods to enforce uniqueness if need be, good points all around. Although with regards to your example, this requires that there be multiple "elements" columns - how would this be accommodated if we use the class hierarchy of BaseSets on down? No answer needed, just something to think about.

kevinrue commented 5 years ago

Along with #6 I think this just needs to be described as a new section in the vignette and we're good to merge as a PR.