ropensci / BaseSet

Provides classes for working with sets
https://docs.ropensci.org/BaseSet
Other
10 stars 3 forks source link

Parse OBO and XML files #23

Closed llrs closed 4 years ago

llrs commented 5 years ago

To parse obo files GSEABase:::.fromOBO To parse .xml files GSEABAse::getBroadSets

llrs commented 5 years ago

Insted of OBO (which describe just the information of the ontologies (or sets)), BaseSet should read *.gaf or .owl format into TidySet and the OBO to a data.frame if it reads it.

llrs commented 5 years ago

Now it can parse but not totally obo and gaf files

llrs commented 5 years ago

To parse xml it uses the XML package... I'm not sure to introduce a new dependency, I got a lot of them using just dplyr

llrs commented 5 years ago

I can add three more dependencies... (tested with miniCRAN::pkgDep("XML", availPkgs = installed.packages(), includeBasePkgs = FALSE, suggests = FALSE))

llrs commented 5 years ago

tools::package.dependencies accepts also installed.packages. See https://github.com/llrs/cases_GSEABase/issues/3

llrs commented 5 years ago

I don't understand the XML part so I won't implement it (I am not sure it is usually used)

llrs commented 5 years ago

GPAD format is related to GAF

http://geneontology.org/docs/gene-product-association-data-gpad-format/

llrs commented 5 years ago

And the XML from MSigDB http://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/MSigDB_XML_description

llrs commented 5 years ago

Also read .owl formats as it seems that it is used by EBI: https://www.ebi.ac.uk/efo/ (see the download section, although I haven't found how are the files structured)

llrs commented 4 years ago

Improved the parsing of obo files lately.

Also on a related note xml2 might work well.

llrs commented 4 years ago

Won't introduce a new dependency (xml2) to parse XML files. OBO are already imported so let's see if there is need for more format files