Closed cthoyt closed 2 years ago
Yes these files have been removed, also pypath
relied on them, now I am planning to move to the Rda
format. Of course csv
would be somewhat more convenient.
Hi @cthoyt,
the file you are looking for is still available in the deprecated
branch: https://github.com/saezlab/dorothea/tree/deprecated/data/TFregulons/consensus/table
Please note, that this file differs clearly from the dorothea regulons we provide in the R package. The "package regulons" are a subset of this file (+ some additional minor changes).
The most recent file comparable to the one you requested you can find here: https://github.com/saezlab/dorothea/blob/master/data/entire_database.rda
Do you think it would be possible to also provide a CSV version of entire_database.rda
? I was looking into it and it seems to be a simple table.
You are right, in the end its just a table, but as far as I know there cannot be .csv
files in the data
folder of bioconductor packages. The only two ways I could think of how to deposit the csv file is either on zenodo or in the inst/extdata
folder.
Do you need to parse this file only once or do you plan to refer regularly to the csv file?
I was unaware of that restriction... If I were making conspiracy theories, I'd say this was to lock people into continued usage of R
I will regularly refer to this file at its source, especially because I want to benefit from any updates you make! If I were to just download a file and start working on it locally, I wouldn't be doing reproducible science.
Both hosting on GitHub and Zenodo are good. If you want to go down the GitHub route, you can also automatically back up the entire repo on Zenodo as well
I've just seen this issue is still open. This Python module can read RDA with absolutely no problem: https://github.com/ofajardo/pyreadr We use it also in pypath: https://github.com/saezlab/pypath/blob/c665bd93b4cc4067e796b055a08dd0e673eaa0ea/src/pypath/inputs/dorothea.py#L309
That's great, I had specific problems using pyreadr
before but what if someone from a different language wants to use this? I still think distributing only R data makes an unnecessary lock-in to R or languages that support wrapping it, whereas a TSV is universally usable by all languages and workflows
You are right about other languages @cthoyt. So can csv
go to extdata
as you told @christianholland?
An older version of this repository (it appears the git history has been purged) hosted tabular versions of the DoRothEA database from 20180915. More specifically, I was relying on data persisting at the following URL:
https://github.com/saezlab/DoRothEA/blob/master/data/TFregulons/consensus/table/database_normal_20180915.csv.zip?raw=true
My use case was to convert this data to BEL for reuse in larger biological networks (code at https://github.com/bio2bel/bio2bel/blob/master/src/bio2bel/sources/tfregulons.py) as part of the Bio2BEL project, which @deeenes and @Nic-Nic have participated.
Would you be willing to resume distributing the database as a CSV to enable users who aren't using R to access the data? Or maybe there's a link somewhere to a Zenodo archive that I missed, since distributing data through GitHub isn't optimal?