waldronlab / SingleCellMultiModal

Single Cell multimodal data scripts for downloading datasets
https://bioconductor.org/packages/SingleCellMultiModal
17 stars 9 forks source link

quality control of cells #53

Closed drighelli closed 1 year ago

drighelli commented 1 year ago

We are going to include quality control of cells inside each dataset, I'm tagging all of you for having a few feedbacks before running quality checks by myself, thanks to everyone!

cvanderaa commented 1 year ago

For the SCoPE2 dataset, low-quality cells were already removed by the authors. For the proteomics data, low-quality cells were defined based on the median coefficient of variation for the proteins in each cell (using negative control samples to set the threshold). For the scRNA-Seq, low-quality cells were defined as cells with less than $10^4$ UMIs.

So, in my opinion, cell QC is not required for the SCoPE2 dataset.

drighelli commented 1 year ago

Thanks @cvanderaa !!

lgeistlinger commented 1 year ago

Hi @drighelli - sorry for the delay! I've checked the G&T papers and it seems QC filtering on the cells is already applied at both the genome sequencing as well as the transcriptome sequencing step.

  1. Genome data analysis—quality control of the single-cell DNA copy-number profiles:

"For quality control, we calculate the median absolute pairwise difference (MAPD) of the genome-wide logR values per single cell. The higher the MAPD value, the higher the overall noise in the DNA copy-number data. We discarded single cells having a MAPD score higher than 0.6 or 2 when the cell’s DNA was amplified with PicoPLEX or MDA, respectively"

  1. Transcriptome data analysis:

"On the basis of the number of uniquely mapped reads and the above distributions, we filter the single-cell transcriptome data based on the number of mapped reads and the number of genes expressed above a set threshold. We applied a threshold of at least 3,500 genes with a TPM value ≥ 1; i.e., single cells demonstrating fewer than 3,500 genes expressed at a TPM value ≥ 1 were excluded from further analyses"

drighelli commented 1 year ago

thanks @lgeistlinger !

drighelli commented 1 year ago

We infer for the seqFISH dataset that quality control have been made during the preprocessing of the dataset by the authors of the hackathon paper. They declare to proprocessed the data for aligning the cell types across the seqFISH and the scRNAseq datasets.

For the CITEseq/ECCITEseq I have followed the section 12 of the Advanced OSCA book, where they suggest a cell quality control based on ADTs and mitochondrial genes.

drighelli commented 1 year ago

For Multiome data the preprocessing was made with Signac