Closed plrlhb12 closed 3 years ago
Hi Lirong,
It looks an issue with matplotlib.pyplot, as it can't be imported for some unclear reasons. However, the plotting is not an essential step, and you can omit it by add --noPlot
or simply igoring it as the only remaining step is output GT_donors.vireo.vcf.gz
, while it is not in your case as the donor genotype is given (right?).
For -N, if you donor vcf has exactly 10 donors, then you don't need to specify the -N 10 as it is the default.
For the high unassignable cells, it is possibbly because the matched SNPs are too few:
[vireo] 5179 out 224764 variants matched to donor VCF
I guess you are using the same genome build, right? But the donor VCF may not contain suffiicent number of SNPs. How many SNPs are there? Have you imputed?
Yuanhua
Hi Yuanhua,
Thank you very much for help with troubleshooting!
You are right that the matched SNPs are too few. In my previous successful cases, there are 10 folds matched SNPs.
In this case, the donor vcf file were provided by others. It was generated from Chip Genotype arrays on the build of h19 and then liftOver to hg38 using CrossMap. It has 40800 variants including 208455 SNPs. I doubt whether the liftover is accurate.
I have separate scRNA-Seq data for every individual donor. I was planning to apply CellSNP to generate cellsnp.cells.vcf for each donor, merge them, and use it as the donor genotype file? According to your description, it seems that I will have to apply both cellSNP and Vireo to generate the GT_donor.vcf.gz, right? However, I am having problem go the step of generating GT_donors.vireo.vcf.gz when running Vireo.
Or I can redo CellSNP using hg19's genome1K.phase3.SNP_AF5e2.chr1toX.hg19.vcf.gz then apply Vireo using donor genotypes before liftover?
Any suggestions are welcome!
Best,
Lirong
Hi Lirong,
If you have separate scRNA for each individual, then using these scRNA for genotyping would give you good overlapped SNPs. You can use cellSNP to perform genotyping in a bulk manner, for each individual separately or jointly (mode 2b): https://cellsnp-lite.readthedocs.io/en/latest/manual.html#mode-2-pileup-whole-chromosome-s-without-given-snps
Note, this is re-implemented cellSNP with C/C++, achieving ~10 times faster.
please add --cellTAG None --UMItag None --genotype
, but don't provide cell barcode -b
as you treat them as a bulk sample. You can use prefiltering, e.g., --minMAF 0.1 --minCOUNT 100
.
Yuanhua
Hi Yuanhua,
Successfully decreased the unassigned rate to less 8% by following your suggestions, although I still having problems in executing plot_GT. Thanks very much!
Lirong
Hi Yuanhua,
When I ran vireo for my new single cell data, I encountered problems in finishing the steps of exporting the fig_GT_distance pdf no matter the version is 0.4.2 or 0.3.2. The previous cellsnp step was done using the old version of v0.1.7.
What would be the possible reason causing the error? Is it related to my dataset itself, because there isn't any error when I tested the vireo using my pervious batch of single cell dataset.
What would be the reason of high unassigned rate? Although there is a problem in fig_GT_distance, I got all the other outputs. For this dataset, I set up -N as 10 since I know there are 10 donors . However, there are more than 1/3 of cells (5428 out of 14249) are unassigned.
Do I have to set -N? I remember it was just optional previously.
The error messages are listed as below. Thank you for your kindly help!
Lirong
############# run using v0.3.2
vireo -c $CELL_FILE -d $DONOR_FILE -N 10 -o $VIREO_OUT_DIR
[+] Loading vireosnp 0.3.2 on cn2402 [+] Loading singularity 3.7.1 on cn2402 /opt/conda/envs/app/lib/python3.7/site-packages/vireoSNP/utils/io_utils.py:17: RuntimeWarning: invalid value encountered in greater_equal if np.sum(mm_idx == mm_idx) == 0 or np.sum(mm_idx >= 0) == 0: /opt/conda/envs/app/lib/python3.7/site-packages/vireoSNP/utils/io_utils.py:20: RuntimeWarning: invalid value encountered in greater_equal if np.sum(mm_idx == mm_idx) == 0 or np.sum(mm_idx >= 0) == 0: Traceback (most recent call last): File "/opt/conda/envs/app/bin/vireo", line 8, in
sys.exit(main())
File "/opt/conda/envs/app/lib/python3.7/site-packages/vireoSNP/vireo.py", line 201, in main
donor_GPb[idx, :, :], donor_vcf['samples'])
File "/opt/conda/envs/app/lib/python3.7/site-packages/vireoSNP/plot/base_plot.py", line 44, in plot_GT
import matplotlib.pyplot as plt
File "/opt/conda/envs/app/lib/python3.7/site-packages/matplotlib/init.py", line 207, in
_check_versions()
File "/opt/conda/envs/app/lib/python3.7/site-packages/matplotlib/init.py", line 192, in _check_versions
from . import ft2font
ImportError: /opt/conda/envs/app/lib/python3.7/site-packages/matplotlib/ft2font.cpython-37m-x86_64-linux-gnu.so: failed to map segment from shared object
swarm_7279138_1.e (END)
########### run using v0.4.2
vireo -c $CELL_FILE -d $DONOR_FILE -N 10 -o $VIREO_OUT_DIR
[vireo] Loading cell VCF file ... [vireo] Loading donor VCF file ... [vireo] 5179 out 224764 variants matched to donor VCF [vireo] Demultiplex 14249 cells to 10 donors with 5179 variants. [vireo] lower bound ranges [-308607.9, -268972.4, -260671.0] [vireo] allelic rate mean and concentrations: [[0.045 0.455 0.894]] [[401111. 493929.6 134207.4]] [vireo] donor size before removing doublets: donor0 donor1 donor2 donor3 donor4 donor5 donor6 donor7 donor8 donor9 600 2575 2088 1144 627 1299 1672 1448 903 1893 [vireo] final donor size: NIH_11A NIH_12A NIH_13A NIH_14A NIH_16A NIH_17A NIH_18A NIH_19A NIH_20A NIH_21A doublet unassigned 156 1847 1417 701 308 817 1062 601 283 975 654 5428 Traceback (most recent call last): File "/data/pengl7/conda/envs/vireo-env/bin/vireo", line 8, in
sys.exit(main())
File "/data/pengl7/conda/envs/vireo-env/lib/python3.7/site-packages/vireoSNP/vireo.py", line 201, in main
donor_GPb[idx, :, :], donor_vcf['samples'])
File "/data/pengl7/conda/envs/vireo-env/lib/python3.7/site-packages/vireoSNP/plot/base_plot.py", line 45, in plot_GT
import matplotlib.pyplot as plt
File "/data/pengl7/conda/envs/vireo-env/lib/python3.7/site-packages/matplotlib/init.py", line 174, in
_check_versions()
File "/data/pengl7/conda/envs/vireo-env/lib/python3.7/site-packages/matplotlib/init.py", line 159, in _check_versions
from . import ft2font
ImportError: /data/pengl7/conda/envs/vireo-env/lib/python3.7/site-packages/matplotlib/ft2font.cpython-37m-x86_64-linux-gnu.so: failed to map segment from shared object: Cannot allocate memory