Closed mjakobs closed 1 year ago
Hello,
What kind of Visium data is this (fresh frozen or FFPE probe-based chemistry)? How many SNPs do you have in cellSNP.base.vcf
and in the chr-specific VCF files under /phasing
? For the chromosomes that phasing worked, what was the phasing confidence (phasing.log)?
Hi Teng,
Thanks for following up on this. The data is based on FFPE, unfortunately.
cellSNP.base.vcf
has 1670 SNPs, with all chromosomes represented.
The number of SNPs in the chr-specific VCF files are as follows:
S06_chr1.phased.vcf.gz
: 2S06_chr1.vcf.gz
: 5S06_chr3.phased.unphased.vcf.gz
: 1S06_chr3.vcf.gz
: 1S06_chr6.phased.unphased.vcf.gz
: 1S06_chr6.vcf.gz
: 2S06_chr7.phased.vcf.gz
: 3S06_chr7.vcf.gz
: 6S06_chr8.phased.unphased.vcf.gz
: 1S06_chr8.vcf.gz
: 2S06_chr9.phased.unphased.vcf.gz
: 0S06_chr9.vcf.gz
: 1S06_chr10.phased.unphased.vcf.gz
: 1S06_chr10.vcf.gz
: 1S06_chr11.phased.vcf.gz
: 2S06_chr11.vcf.gz
: 3S06_chr14.phased.unphased.vcf.gz
: 1S06_chr14.vcf.gz
: 9S06_chr15.phased.unphased.vcf.gz
: 1S06_chr15.vcf.gz
: 2S06_chr16.phased.unphased.vcf.gz
: 0S06_chr16.vcf.gz
: 1S06_chr17.phased.unphased.vcf.gz
: 1S06_chr17.vcf.gz
: 2S06_chr19.phased.vcf.gz
: 2S06_chr19.vcf.gz
: 4S06_chr21.phased.unphased.vcf.gz
: 0S06_chr21.vcf.gz
: 4S06_chr22.phased.unphased.vcf.gz
: 0S06_chr22.vcf.gz
: 2The contents of the phasing.log
file are:
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr1.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr1.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr1.phased
Setting number of threads to 4
Reference samples: Nref = 2548
Target samples: Ntarget = 1
SNPs to analyze: M = 0 SNPs in both target and reference
SNPs ignored: 1 SNPs in target but not reference
5795045 SNPs in reference but not target
0 multi-allelic SNPs in target
Missing rate in target genotypes: -nan
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr2.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr2.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr2.phased
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr3.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr3.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr3.phased
Setting number of threads to 4
Reference samples: Nref = 2548
Target samples: Ntarget = 1
SNPs to analyze: M = 2 SNPs in both target and reference
SNPs ignored: 0 SNPs in target but not reference
5280534 SNPs in reference but not target
0 multi-allelic SNPs in target
Missing rate in target genotypes: 0
Filling in genetic map coordinates using reference file:
/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz
Physical distance range: 89280766 base pairs
Genetic distance range: 84.7166 cM
Average # SNPs per cM: 0
Number of <=(64-SNP, 1cM) segments: 1
Average # SNPs per segment: 2
Time for reading input: 62.1244 sec
Fraction of heterozygous genotypes: 0.5
Typical span of default 100-het history length: 8471.66 cM
Setting --histFactor=1.00
Auto-selecting number of phasing iterations: setting --pbwtIters to 1
BEGINNING PHASING
PHASING ITER 1 OF 1
Phasing target samples
................................................................................
Time for phasing iter 1: 0.00697017
Writing vcf.gz output to ./phasing/S05_chr3.phased.vcf.gz
Time for writing output: 0.0200469
Total elapsed time for analysis = 62.1521 sec
Mean phase confidence of each target individual:
ID PHASE_CONFIDENCE
S05 -nan
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr4.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr4.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr4.phased
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr5.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr5.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr5.phased
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr6.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr6.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr6.phased
Setting number of threads to 4
Reference samples: Nref = 2548
Target samples: Ntarget = 1
SNPs to analyze: M = 1 SNPs in both target and reference
SNPs ignored: 0 SNPs in target but not reference
4539754 SNPs in reference but not target
0 multi-allelic SNPs in target
Missing rate in target genotypes: 0
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr7.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr7.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr7.phased
Setting number of threads to 4
Reference samples: Nref = 2548
Target samples: Ntarget = 1
SNPs to analyze: M = 1 SNPs in both target and reference
SNPs ignored: 3 SNPs in target but not reference
4222929 SNPs in reference but not target
0 multi-allelic SNPs in target
Missing rate in target genotypes: 0
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr8.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr8.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr8.phased
Setting number of threads to 4
Reference samples: Nref = 2548
Target samples: Ntarget = 1
SNPs to analyze: M = 1 SNPs in both target and reference
SNPs ignored: 1 SNPs in target but not reference
4162375 SNPs in reference but not target
0 multi-allelic SNPs in target
Missing rate in target genotypes: 0
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr9.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr9.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr9.phased
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr10.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr10.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr10.phased
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr11.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr11.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr11.phased
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr12.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr12.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr12.phased
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr13.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr13.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr13.phased
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr14.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr14.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr14.phased
Setting number of threads to 4
Reference samples: Nref = 2548
Target samples: Ntarget = 1
SNPs to analyze: M = 0 SNPs in both target and reference
SNPs ignored: 4 SNPs in target but not reference
2383125 SNPs in reference but not target
0 multi-allelic SNPs in target
Missing rate in target genotypes: -nan
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr15.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr15.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr15.phased
Setting number of threads to 4
Reference samples: Nref = 2548
Target samples: Ntarget = 1
SNPs to analyze: M = 1 SNPs in both target and reference
SNPs ignored: 1 SNPs in target but not reference
2153932 SNPs in reference but not target
0 multi-allelic SNPs in target
Missing rate in target genotypes: 0
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr16.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr16.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr16.phased
Setting number of threads to 4
Reference samples: Nref = 2548
Target samples: Ntarget = 1
SNPs to analyze: M = 0 SNPs in both target and reference
SNPs ignored: 1 SNPs in target but not reference
2410531 SNPs in reference but not target
0 multi-allelic SNPs in target
Missing rate in target genotypes: -nan
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr17.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr17.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr17.phased
Setting number of threads to 4
Reference samples: Nref = 2548
Target samples: Ntarget = 1
SNPs to analyze: M = 0 SNPs in both target and reference
SNPs ignored: 2 SNPs in target but not reference
2066683 SNPs in reference but not target
0 multi-allelic SNPs in target
Missing rate in target genotypes: -nan
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr18.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr18.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr18.phased
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr19.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr19.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr19.phased
Setting number of threads to 4
Reference samples: Nref = 2548
Target samples: Ntarget = 1
SNPs to analyze: M = 0 SNPs in both target and reference
SNPs ignored: 2 SNPs in target but not reference
1625698 SNPs in reference but not target
0 multi-allelic SNPs in target
Missing rate in target genotypes: -nan
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr20.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr20.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr20.phased
Setting number of threads to 4
Reference samples: Nref = 2548
Target samples: Ntarget = 1
SNPs to analyze: M = 1 SNPs in both target and reference
SNPs ignored: 0 SNPs in target but not reference
1706441 SNPs in reference but not target
0 multi-allelic SNPs in target
Missing rate in target genotypes: 0
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr21.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr21.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr21.phased
Setting number of threads to 4
Reference samples: Nref = 2548
Target samples: Ntarget = 1
SNPs to analyze: M = 0 SNPs in both target and reference
SNPs ignored: 1 SNPs in target but not reference
976599 SNPs in reference but not target
0 multi-allelic SNPs in target
Missing rate in target genotypes: -nan
+-----------------------------+
| |
| Eagle v2.4.1 |
| November 18, 2018 |
| Po-Ru Loh |
| |
+-----------------------------+
Copyright (C) 2015-2018 Harvard University.
Distributed under the GNU GPLv3+ open source license.
Command line options:
/lmod/apps/eagle/2.4.1/bin/eagle \
--numThreads 4 \
--vcfTarget ./phasing/S05_chr22.vcf.gz \
--vcfRef /data/tog/gmjakobsdottir/numbat_references/1000G_hg38/chr22.genotypes.bcf \
--geneticMapFile=/data/tog/gmjakobsdottir/numbat_references/genetic_map_hg38_withX.txt.gz \
--outPrefix ./phasing/S05_chr22.phased
Setting number of threads to 4
Reference samples: Nref = 2548
Target samples: Ntarget = 1
SNPs to analyze: M = 0 SNPs in both target and reference
SNPs ignored: 1 SNPs in target but not reference
993881 SNPs in reference but not target
0 multi-allelic SNPs in target
Missing rate in target genotypes: -nan
Thanks for taking the time to look into this! Maria
Yeah,unfortunately Visium for FFPE is probe based and it doesn't actually sequence the transcripts (so no SNP information is captured).
Ah, of course. Thank you for your time!
Hi team,
I started following your article on how to run numbat on spatial transcriptomics data but have encountered some issues in the pre-processing stage.
Code run:
Error output:
Following the information in this comment it seems that the script is running correctly for most of the chromosomes as:
/pileup
are not empty, andcellSNP.base.vcf
is not empty.{sample}_chr*.vcf.gz
files under/phasing
, however, the files for the chromosomes listed in the error message (2, 4, 5, 12, 13, 18, 20) are not present.{sample}_chr*.phased.vcf.gz
files under/phasing
, however, the files for the chromosomes listed in the error message are not present.I have tried this on a second sample from the same lot and received a similar error affecting chromosomes 2, 4, 5, 9, 10, 11, 12, 13, and 18.
Do you have any suggestions for how to proceed?
Thank you for your help! Maria