waldronlab / SingleCellMultiModal

Single Cell multimodal data scripts for downloading datasets
https://bioconductor.org/packages/SingleCellMultiModal
17 stars 9 forks source link

SingleCellMultiModal could store the experiments technologies #37

Closed drighelli closed 3 years ago

drighelli commented 3 years ago

The MultiAssayExperiment object returned by the SingleCellMultiModal function could store the experiment-related technology somewhere.

This could help the user to trace the technology-dataset relation.

Thanks.

LiNk-NY commented 3 years ago

I think you mean the function call, right? I've added this in 928b62a Feel free to reopen if not.

drighelli commented 3 years ago

oh ok thanks! that was a big mistake!

drighelli commented 3 years ago

Hi Marcel @LiNk-NY ,

sorry but I'm not able to find the place where you store the relation between the technology and the dataset.

Maybe I can explain it better with an example:

By doing this call

mae <- SingleCellMultiModal(c("mouse_gastrulation", "pbmc_10x", "cord_blood", "peripheral_blood", "mouse_embryo_8_cell", "macrophage_differentiation", "mouse_visual_cortex"), versions=c("2.0.0", "1.0.0", "1.0.0","1.0.0","1.0.0","1.0.0","2.0.0"), dry.run=FALSE)

I get this ExperimentList

mae@ExperimentList ExperimentList class object of length 22: [1] mouse_gastrulation_acc_cgi: matrix with 14824 rows and 1101 columns [2] mouse_gastrulation_acc_DHS: matrix with 20082 rows and 1094 columns [3] mouse_gastrulation_acc_genebody: matrix with 17924 rows and 1105 columns [4] mouse_gastrulation_acc_promoter: matrix with 18037 rows and 1103 columns [5] mouse_gastrulation_met_cgi: matrix with 14080 rows and 986 columns [6] mouse_gastrulation_met_DHS: matrix with 6673 rows and 986 columns [7] mouse_gastrulation_met_genebody: matrix with 17559 rows and 986 columns [8] mouse_gastrulation_met_promoter: matrix with 17179 rows and 986 columns [9] mouse_gastrulation_rna: matrix with 18345 rows and 2480 columns [10] pbmc_10x_atac: SingleCellExperiment with 108344 rows and 10032 columns [11] pbmc_10x_rna: SingleCellExperiment with 36549 rows and 10032 columns [12] cord_blood_scADT: matrix with 13 rows and 8617 columns [13] cord_blood_scRNAseq: matrix with 36280 rows and 8617 columns [14] peripheral_blood_scADT: dgCMatrix with 52 rows and 13000 columns [15] peripheral_blood_scHTO: dgCMatrix with 7 rows and 13000 columns [16] peripheral_blood_scRNA: dgCMatrix with 33538 rows and 10248 columns [17] mouse_embryo_8_cell_genomic: RaggedExperiment with 2366 rows and 112 columns [18] mouse_embryo_8_cell_transcriptomic: SingleCellExperiment with 24029 rows and 112 columns [19] macrophage_differentiation_protein: SingleCellExperiment with 3042 rows and 1490 columns [20] macrophage_differentiation_rna: SingleCellExperiment with 32738 rows and 20274 columns [21] mouse_visual_cortex_seqFISH: SpatialExperiment with 113 rows and 1597 columns [22] mouse_visual_cortex_scRNAseq: SingleCellExperiment with 113 rows and 1723 columns

But I don't know which technology each returned experiment is associated with.

For example if I'm a general user I don't know that macrophage_differentiation_protein is the SCoPE2 technology.

I hope this clarifies what I meant.

Thanks

LiNk-NY commented 3 years ago

Hi Dario, @drighelli

There are two places where we could put this information. One is in the metadata of the ExperimentList and the other is in the metadata of the MultiAssayExperiment. The former is a little bit harder to discover so I've included it in the latter. I have an internal structure called a call_map that maps the functions to the DataTypes; this is what I use:

> metadata(mae)$call_map
DataFrame with 7 rows and 6 columns
          FUN               DataType   dry.run   verbose     version           modes
  <character>            <character> <logical> <logical> <character> <CharacterList>
1       scNMT     mouse_gastrulation     FALSE      TRUE       2.0.0               *
2  scMultiome               pbmc_10x     FALSE      TRUE       1.0.0               *
3     CITEseq             cord_blood     FALSE      TRUE       1.0.0               *
4     CITEseq       peripheral_blood     FALSE      TRUE       1.0.0               *
5       GTseq    mouse_embryo_8_cell     FALSE      TRUE       1.0.0               *
6      SCoPE2 macrophage_different..     FALSE      TRUE       1.0.0               *
7     seqFISH    mouse_visual_cortex     FALSE      TRUE       2.0.0               *

We could have a more formal labeling on the names of the ExperimentList but this would require a bit more thought and infrastructure work.

LiNk-NY commented 3 years ago

I was resolved here: 92cd2dab074204823e0eef828928dc7377e104e9