How to get data - Githubissues

omnideconv / deconvBench

Comparison of 2nd generation deconvolution methods implemented in omnideconv

2 stars 0 forks source link

How to get data #36

Open grst opened 6 months ago

grst commented 6 months ago

In the "Getting started" section you describe two parameters pointing to data directories:

    data_dir_bulk = "/user/benchmarking/datasets/bulks"
    data_dir_sc = "/user/benchmarking/datasets/single_cell"

Could you include a description where to get these data from?

grst commented 6 months ago

Why not just include the count matrices in figshare as well? Retrieving and preprocessing them from the original data sources is a huge hurdle for reproducing/extending your analysis.

alex-d13 commented 6 months ago

Will check this with the others, but I am not sure if we are allowed to upload all of the datasets again on our own? I think we agreed to just upload our annotations.

grst commented 6 months ago

Unless you are working with some protected-access datasets such as dbGAP I don't see any issues.

alex-d13 commented 6 months ago

Would we need some kind of license for these, for example the lung cancer dataset?

grst commented 6 months ago

Tbh I don't know what's the license of a dataset published on e.g. GEO, Array express etc. But you can obviously use it to create derivative works in publications, and I don't see why a preprocessed dataset wouldn't count as such a derivative work. At least I did it in the past and nobody complained.

Of course in cases of protected access datasets (such as dbGAP) this is not possible.