Closed robertamezquita closed 5 years ago
Suggestion by @kevinrue : grab a public small GMT file and drop it into inst/extdata
, write unit tests and examples based on it. access it via gmtFile <- system.file(package = "unisets", "inst", "extdata", "example.gmt")
Comment by @kevinrue on 06e1128f3030cc073f8d25158f074de5f27050c3 :
re: bullet 1 @kevinrue this should probably be in the BaseSets
constructor?
Actually, as something that crossed my mind over the weekend, I realised that for things like the Gene Ontology, there is the possibility of describing the same gene:set mapping with different "evidence codes" (http://geneontology.org/page/guide-go-evidence-codes).
Currently, packages like org.Hs.eg.db store mappings in database format, i.e. as multiple relationships between the same gene and the same gene set, each with a distinct evidence code.
It's obviously open for discussion, but following that logic in "long table format", I'd argue that the BaseSets
class should not enforce a "unique" validity check on the gene:set relations, and that we leave it up to the user to clean the BaseSets
object after import (easier than cleaning the GMT file before import).
Following the same logic, I would say that the answer to #5 should be "yes, allow duplicate elements".
In that case, I'll leave the unique aspect alone for now, maybe issue a message, and leave it to downstream methods to enforce uniqueness if need be, good points all around. Although with regards to your example, this requires that there be multiple "elements" columns - how would this be accommodated if we use the class hierarchy of BaseSets on down? No answer needed, just something to think about.
Along with #6 I think this just needs to be described as a new section in the vignette and we're good to merge as a PR.
See: https://github.com/Kayla-Morrell/GeneSet/blob/tibble_implement/R/import-class.R#L32