Closed sharkLoc closed 1 year ago
Hello @sharkLoc,
I just came across your question. I don't know anything about genomics, but I can make an uneducated guess based on reading the code.
It would seem that create-input-vcf
creates a BED file with all the regions used by PyPGx. The regions in the BED file are identified by chromosome. I'm assuming the generation of the VCF file from the FASTA and the BAM will use the information in the BED file to constrain the search. Following this assumption, if the BED file has a region in chromosome 1 (i.e. chr1
), but your BAM file does not have chromosome 1 information, the process will fail.
the sequence "chr1" not found: Sample_JZ23.bam
The error message you posted seems to say something on those lines.
I hope this plus your knowledge of the input files and the field will help you track the error!
Hi @sharkLoc,
I believe I'll require a bit more information to address the issue.
$ fuc bam-head Sample_JZ23.bam
? The fuc
package should be already installed in your PyPGx environment because it is one of the dependencies required to run PyPGx.$ cat ref_hg38/part.fa | grep ">"
And a special thank you to @lexotero for joining the discussion! In theory, the command should not fail just because the input BAM file is missing a certain chromosome, as not all BAM files are generated by whole genome sequencing (e.g. targeted sequencing).
1、After running the command, output nothing。 2、cat ref_hg38/part.fa | grep ">"
>chr5
>chr14
>chr22
my data is Amplicon sequencing. I was aligned reads to hg38(just chr5,14,22)
你好 ,
我刚刚遇到了你的问题。我对基因组学一无所知,但我可以根据阅读代码做出未经教育的猜测。
似乎会创建一个包含 PyPGx 使用的所有区域的 BED 文件。BED文件中的区域由染色体标识。我假设从 FASTA 生成 VCF 文件,BAM 将使用 BED 文件中的信息来限制搜索。按照此假设,如果 BED 文件在染色体 1 中有一个区域(即),但您的 BAM 文件没有染色体 1 信息,则该过程将失败。
create-input-vcf``chr1
the sequence "chr1" not found: Sample_JZ23.bam
您发布的错误消息似乎在这些行上说明了一些内容。
我希望这加上您对输入文件和字段的了解将帮助您跟踪错误!
Hi, @lexotero
chr1 is not included in my bed file
@sharkLoc,
Thanks for sharing the outputs.
1、After running the command, output nothing。
This is concerning because it implies that your BAM file is lacking its header, which could potentially indicate corruption or improper formatting. How was this file generated?
If the file isn't too large, could you please share the BAM file with me at sbstevenlee@gmail.com? I need to examine the file to determine if there's a possibility of recovering it.
This is concerning because it implies that your BAM file is lacking its header, which could potentially indicate corruption or improper formatting. How was this file generated?
I misunderstood you,
pypgx create-input-vcf --max-depth 5000 --assembly GRCh38 out.vcf.gz .. /ref_hg38/part.fa Sample_JZ23.bam
The output of this command is empty. My bam file is fine, it is generated by the bwa mem module
@sharkLoc,
Then can you show me the output of the command $ fuc bam-head Sample_JZ23.bam
? I need to see its header, specifically its contigs.
@sharkLoc, Then can you show me the output of the command
$ fuc bam-head Sample_JZ23.bam
? I need to see its header, specifically its contigs.
output :
@HD VN:1.6 SO:coordinate
@SQ SN:chr5 LN:181538259
@SQ SN:chr14 LN:107043718
@SQ SN:chr22 LN:50818468
@RG ID:Sample_JZ23 PL:illumina SM:Sample_JZ23 LB:Sample_JZ23
@PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:bwa mem -M -t 4 -R @RG\tID:Sample_JZ23\tPL:illumina\tSM:Sample_JZ23\tLB:Sample_JZ23 /home/sharkloc/work/ref38/part.fa /home/sharkloc/work/result/0.trim/Sample_JZ23/Sample_JZ23_clean_r1.fq /home/sharkloc/work/result/0.trim/Sample_JZ23/Sample_JZ23_clean_r2.fq
@PG ID:samtools PN:samtools PP:bwa VN:1.14 CL:samtools view -Sb -o Sample_JZ23.unsrt.bam -
@PG ID:samtools.1 PN:samtools PP:samtools VN:1.14 CL:samtools sort -@ 4 -o Sample_JZ23.bam Sample_JZ23.unsrt.bam
@sharkLoc,
Hmmm... I've never encountered this issue before. Would it be possible for you to share the BAM and FASTA files with me? I think I will need to tweak around the parameters of variant calling to solve this issue. Please send the files to my email above if that's OK with you. If you can't send the files, we will still try to find some ways. Please let me know.
Hmmm... I've never encountered this issue before. Would it be possible for you to share the BAM and FASTA files with me? I think I will need to tweak around the parameters of variant calling to solve this issue. Please send the files to my email above if that's OK with you. If you can't send the files, we will still try to find some ways. Please let me know.
@sbslee , I am sorry that I am unable to provide my sequencing data. My data only contains capture amplification data on these three chromosomes.
part.fa:
samtools faidx hg38.fa chr5 > part.fa
samtools faidx hg38.fa chr14 >> part.fa
samtools faidx hg38.fa chr22 >> part.fa
@sharkLoc,
No worries, I completely understand. In that case, I will use my own BAM file and try to filter out reads that were aligned to chr1 and see if I can reproduce the issue from my end. It may take some time though. I will get back to you ASAP!
@sharkLoc,
OK, I was able to reproduce the error you reported.
The error was caused (probably) because bcftools
was trying to access contigs (e.g. chr1
) that are not present in your BAM file's header which only contains three contigs (chr5
, chr14
, and chr22
).
I think this can be easily resolved by updating the header to include the other contigs using the samtools reheader
command:
$ samtools reheader new_header.sam Sample_JZ23.bam > Sample_JZ23.new_header.bam
$ samtools index Sample_JZ23.new_header.bam
where new_header.sam
looks like below:
@HD VN:1.6 SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ SN:chr6 LN:170805979
@SQ SN:chr7 LN:159345973
@SQ SN:chr8 LN:145138636
@SQ SN:chr9 LN:138394717
@SQ SN:chr10 LN:133797422
@SQ SN:chr11 LN:135086622
@SQ SN:chr12 LN:133275309
@SQ SN:chr13 LN:114364328
@SQ SN:chr14 LN:107043718
@SQ SN:chr15 LN:101991189
@SQ SN:chr16 LN:90338345
@SQ SN:chr17 LN:83257441
@SQ SN:chr18 LN:80373285
@SQ SN:chr19 LN:58617616
@SQ SN:chr20 LN:64444167
@SQ SN:chr21 LN:46709983
@SQ SN:chr22 LN:50818468
@SQ SN:chrX LN:156040895
@SQ SN:chrY LN:57227415
@SQ SN:chrM LN:16569
@RG ID:Sample_JZ23 PL:illumina SM:Sample_JZ23 LB:Sample_JZ23
@PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:bwa mem -M -t 4 -R @RG\tID:Sample_JZ23\tPL:illumina\tSM:Sample_JZ23\tLB:Sample_JZ23 /home/sharkloc/work/ref38/part.fa /home/sharkloc/work/result/0.trim/Sample_JZ23/Sample_JZ23_clean_r1.fq /home/sharkloc/work/result/0.trim/Sample_JZ23/Sample_JZ23_clean_r2.fq
@PG ID:samtools PN:samtools PP:bwa VN:1.14 CL:samtools view -Sb -o Sample_JZ23.unsrt.bam -
@PG ID:samtools.1 PN:samtools PP:samtools VN:1.14 CL:samtools sort -@ 4 -o Sample_JZ23.bam Sample_JZ23.unsrt.bam
Please let me know if this doesn't work.
@sharkLoc, OK, I was able to reproduce the error you reported.
Please let me know if this doesn't work.
it's worked, thanks
data type:
Targeted
command:
pypgx create-input-vcf --max-depth 5000 --assembly GRCh38 out.vcf.gz ../ref_hg38/part.fa Sample_JZ23.bam
error info:
any suggestion?