ocbe-uio / DIscBIO

A user-friendly R pipeline for biomarker discovery in single-cell transcriptomics
Other
12 stars 5 forks source link

Demo data is too large for examples #2

Closed wleoncio closed 4 years ago

wleoncio commented 4 years ago

With 59 838 obsservations and 94 variables, the valuesG1ms dataset that comes with the package is too large for some function examples.

@SystemsBiologist, is it possible to add a second data example, with a subset of valuesG1ms? Which is the best way to subset the data and still have the dataset make sense? One idea is to just keep 33 columns (G1–G1.10, S1–S1.10, G2–G2.10) and, say, the first 1 000 rows of the dataset. Is this reasonable?

SystemsBiologist commented 4 years ago

I will add it soon

SystemsBiologist commented 4 years ago

The demo dataset was added. It contains 30 cells randomly selected and 1000 genes in addition to 92 ERCCs. I am not sure what results we could get out of this dataset though!

wleoncio commented 4 years ago

The demo dataset was added. It contains 30 cells randomly selected and 1000 genes in addition to 92 ERCCs. I am not sure what results we could get out of this dataset though!

Fantastic news! The reduced dataset solves the data folder size issue, but if you are concerned about the results, perhaps we can keep both datasets in the package. It will yield a note for the repository editors, but maybe they'll allow both datasets to be included if we justify it well enough. Then, if they ask us to remove it, we put the full dataset back on .Rbuildignore. Your call.

In any case, I'll see how the reduced dataset behaves in the unit tests and the examples.

wleoncio commented 4 years ago

New dataset looks proper Closing issue.