Closed HealHer closed 3 years ago
Hi,
I'm not sure I follow all the explanations, however it seems there is a mistake in your YAML file:
the I1_001.fastq.gz
contain BC not UMI in 10x data! Also, if each of the fastq files corresponds to one 10x sample, you do not need to use this file at all.
Hopefully this solves your issue. Best, C
Hi,
Thank you very much for your quick answer.
Indeed, it is a single 10x sample. So I removed the definition of file 3
in the yaml file but this did not correct our problem.
QC shows that the number and quality of reads, cells are nearly identical however the representation by umap, linear regression and differential analysis shows big differences on the expressed genes.
My question is about the joint use of the 10x tool (cellranger
) and zUMIs
. The goal is not to benchmark but more to understand if the two outputs should be similar or not in the genes detected by the 2 tools on the same sample.
The same indexes were used in both cases with the same genome annotation file (.gtf)
Do you have any insight on this ?
Thank you very much for your time,
Alexis
Feel free to reopen the issue if you still need assistance.
Hello,
Thank you very much for zUMIs. The program is very stable and allows many things. I use zUMIs to align private data in order to get processing leads.
Running zUMIs on private datasets
We tried to run zUMIs on 2 datasets (p1 and p2), coming from 2 very close human patients with the same initial condition. Both run used 10x Chromium v2 sequencing protocol. The program runs (the .yaml files are available at the end of the issue). We use custom STAR indexes and a human Hg38 genome. The indexes are generated using:
Error sometimes occuring
There is sometimes an error occuring on the
Counting
phase. Re-running the by changing thewhich_Stage
option toCounting
solve the issue. The error is always the same:Compute output
The
.rds
files of the runs p1 and p2 are converted into anndata (.h5df and then .h5ad) by using theDropletUtils.write10xcounts()
module.When we analyze the results by linear regression we obtain a horizontal orthogonal line proving that the genes aligned in run 1 are all different from run 2 (figure here)
We try reproducing these runs using cellranger in the same enviroment using custom indexes generated by the same method. After QC we detect many identical parameters (like number of cells, number of reads or distribution), but however a linear regression places the two runs (p1 and p2) with the same proportions following almost gene(p1) = gene(p2) (figure here)
In light of the
.run.yaml
files that allowed us to run the alignment and treatement, would it be possible for you to see if I'm forgetting any important options that could cause this?Let me know if you need more information,
Thank you very much for your time
Alexis
Ressources
p1.run.yaml
p2.run.yaml