Closed llrs closed 4 years ago
Insted of OBO (which describe just the information of the ontologies (or sets)), BaseSet should read *.gaf or .owl format into TidySet and the OBO to a data.frame if it reads it.
Now it can parse but not totally obo and gaf files
To parse xml it uses the XML package... I'm not sure to introduce a new dependency, I got a lot of them using just dplyr
I can add three more dependencies... (tested with miniCRAN::pkgDep("XML", availPkgs = installed.packages(), includeBasePkgs = FALSE, suggests = FALSE)
)
tools::package.dependencies
accepts also installed.packages. See https://github.com/llrs/cases_GSEABase/issues/3
I don't understand the XML part so I won't implement it (I am not sure it is usually used)
GPAD format is related to GAF
http://geneontology.org/docs/gene-product-association-data-gpad-format/
And the XML from MSigDB http://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/MSigDB_XML_description
Also read .owl formats as it seems that it is used by EBI: https://www.ebi.ac.uk/efo/ (see the download section, although I haven't found how are the files structured)
Improved the parsing of obo files lately.
Also on a related note xml2 might work well.
Won't introduce a new dependency (xml2) to parse XML files. OBO are already imported so let's see if there is need for more format files
To parse obo files GSEABase:::.fromOBO To parse .xml files GSEABAse::getBroadSets