Closed bednarsky closed 1 year ago
Hey here are my suggestions:
Finished identifying 0 samples to load from VCF/BCF
is really a problem. bcftools is trying to identify the samples from a vcf of common SNPs which don't have any sample lists. So it can not identify any sample.popscle dsc-pileup --sam sorted
.bam --tag-group CB --tag-UMI UB --vcf genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf --sam-verbose 10000
00 --vcf-verbose 10000 --cap-BQ 40 --min-BQ 13 --min-MQ 20 --min-TD 0 --excl-flag 3844 --group-list b
arcodes.tsv.gz --min-total 0 --min-uniq 0 --min-snp 0 --out freemuxlet_391/plp/freemuxlet_out
outside the nextflow pipeline? Maybe tabix indeed solves the problem but due to the reason that hadge doesnt mount the tbi into the working directory, freemuxlet can not find this file and fails again. That could also be possible.
Thanks again for the fast response, amazing support! I will let you know what worked.
Keep them coming @bednarsky ! Thank you for playing with hadge. We're happy to provide additional assistance if necessary.
Please feel free to reopen this issue if you are still running into issues and we would be happy to help troubleshoot further.
I had an issue with the header and the table of vcf files using different formatting. I changed all formatting to the one in the bam files, namely ch1, chr2 etc, and then the reordering with popscle_helper_tools worked, and the process continued.
These are my commands to download and prepare the files.
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=13aebUpEKrtjliyT9rYzRijtkNJVUk5F_' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=13aebUpEKrtjliyT9rYzRijtkNJVUk5F_" -O common_variants_grch38.vcf && rm -rf /tmp/cookies.txt
# rename chromosomes to match bams, need to change header back first so that then bcftools works
sed 's/ID=chr/ID=/g' common_variants_grch38.vcf > common_variants_grch38.no_chr.vcf
# create file for renaming, then do renaming
for i in {1..22} X Y; do echo -e "$i\tchr$i"; done > rename_chrs.txt
bcftools annotate --rename-chrs rename_chrs.txt -o common_variants_grch38.vcf common_variants_grch38.no_chr.vcf
# adapt for scSplit like done in example from repo
bcftools query -f '%CHROM:%POS\n' common_variants_grch38.vcf > common_variants_grch38_list.vcf
# freemuxlet and cellsnp
wget https://master.dl.sourceforge.net/project/cellsnp/SNPlist/genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf.gz
gunzip genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf.gz
# do the renaming
bcftools annotate --rename-chrs rename_chrs.txt -o genome1K.phase3.SNP_AF5e2.chr1toX.hg38.chr.vcf genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf
# reorder
popscle_helper_tools/sort_vcf_same_as_bam.sh \
tmp_dir_example_bam/example_bam.bam \
genome1K.phase3.SNP_AF5e2.chr1toX.hg38.chr.vcf > genome1K.phase3.SNP_AF5e2.chr1toX.hg38.chr.reordered.vcf
Removing the contig lines, as proposed here, https://github.com/statgen/popscle/issues/25#issuecomment-591292450, led to different downstream errors.
Sorry for having so many questions and thank you in advance for your help!
Freemuxlet fails with this error:
Do you know this issue? What is your recommended solution? I tried popscle_helper_tools to reorder the vcf file, but it is currently still failing, see issue on that repo.
Also note this line
Finished identifying 0 samples to load from VCF/BCF
→ is this usual, or does this point to an earlier error?This issue occured for both hg38 (which matches the ref genome used for demultiplexing) and hg19 (which I used first by accident).
Really appreciate all your work!