Closed yaarau closed 6 months ago
Hi @yaarau,
Thanks for your interest in PyPGx!
I understand that the bam files created by this pipeline cannot be used as an input for PyPGX.
Have you actually tried this? I'm not 100% clear where you got the impression that DRAGEN BAMs cannot be inputted to PyPGx. I personally haven't worked with DRAGEN BAMs, so it might be true, but I'd be surprised.
Also, is your data WGS or targeted sequencing (e.g., WES)?
Hi Steven,thank you for your quick answer.We mostly work with WES. I got the impression that our bam files are not suited from PyPGX documentation regarding alt contigs: However, there is one important caveat to consider if your sequencing data is GRCh38. That is, sequence reads must be aligned only to the main contigs (i.e. chr1, chr2, …, chrX, chrY), and not to the alternative (ALT) contigs such as chr1_KI270762v1_alt. This is because the presence of ALT contigs reduces the sensitivity of variant calling and many other analyses including SV detection. Therefore, if you have sequencing data in GRCh38, make sure it’s aligned to the main contigs only. The only exception to above rule is the GSTT1 gene, which is located on chr22 for GRCh37 but on chr22_KI270879v1_alt for GRCh38. As we are aligning to DRAGEN's recommended alt-masked-graph reference, which includes ALT contigs.Do you recommend that I'll try running PyPGX over bam files created using this alignment?Do you have any testing set over which I can test to verify I'm getting the right results when aligning with our usual reference? Thank you,Yaara On Tuesday, April 16, 2024 at 10:35:44 AM GMT+3, sbslee @.***> wrote:
Hi @yaarau,
Thanks for your interest in PyPGx!
I understand that the bam files created by this pipeline cannot be used as an input for PyPGX.
Have you actually tried this? I'm not 100% clear where you got the impression that DRAGEN BAMs cannot be inputted to PyPGx. I personally haven't worked with DRAGEN BAMs, so it might be true, but I'd be surprised.
Also, is your data WGS or targeted sequencing (e.g., WES)?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
@yaarau,
$ pypgx prepare-depth-of-coverage \
depth-of-coverage.zip \
in.bam \
--assembly GRCh38
Traceback (most recent call last):
File "/Users/sbslee/opt/anaconda3/envs/fuc/bin/pypgx", line 33, in <module>
sys.exit(load_entry_point('pypgx', 'console_scripts', 'pypgx')())
File "/Users/sbslee/Desktop/pypgx/pypgx/__main__.py", line 33, in main
commands[args.command].main(args)
File "/Users/sbslee/Desktop/pypgx/pypgx/cli/prepare_depth_of_coverage.py", line 90, in main
archive = utils.prepare_depth_of_coverage(
File "/Users/sbslee/Desktop/pypgx/pypgx/api/utils.py", line 1247, in prepare_depth_of_coverage
cf = pycov.CovFrame.from_bam(bams, regions=regions, zero=True)
File "/Users/sbslee/Desktop/fuc/fuc/api/pycov.py", line 345, in from_bam
results += pysam.depth(*(bams + args + ['-r', region]))
File "/Users/sbslee/opt/anaconda3/envs/fuc/lib/python3.9/site-packages/pysam/utils.py", line 69, in __call__
raise SamtoolsError(
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools depth: cannot parse region "chr22_KI270879v1_alt:267307-281486"\n'
Then, follow this instruction on FAQ.
Hi Steven, thank you so much for your answer. As we are using both WES and WGS, I'll try starting from both VCF and BAM inputs, each for the relevant sample type.
Since I'm not sure my BAM (that contains ALT contigs mapping) will work properly - do you have fastq input with the expected PyPGX output? If PyPGX is running over the BAM successfully - does it mean I have nothing to worry about, or could I get wrong results? Such results will be harder to notice on my side, if I don't have a testing set.
Thank you!
@yaarau,
do you have fastq input with the expected PyPGX output?
I have this WGS tutorial that starts with BAM files, but I don't have one starting with FASTQ. However, you can use a standard aligner like BWA-MEM to convert FASTQ to BAM.
If PyPGX is running over the BAM successfully - does it mean I have nothing to worry about, or could I get wrong results? Such results will be harder to notice on my side, if I don't have a testing set.
Generally speaking, if everything ran without errors, then I'd say that there isn't much to worry about, especially if you are just inputting VCF. However, if you are starting with BAM and including SV detection, then I strongly advise that you look at copy number and allele fraction profiles (you can find examples here).
Closed due to inactivity. Please feel free to re-open it if necessary.
Hi Steven, This is more a question than an issue, but I couldn't find any other way to communicate with you. We are regularly running DRAGEN as a secondary pipeline, with Illumina's recommended alt-masked-graph reference (for both hg19 and hg38 - https://emea.illumina.com/science/genomics-research/articles/dragen-demystifying-reference-genomes.html). I understand that the bam files created by this pipeline cannot be used as an input for PyPGX. Is there any way for us to enable using them as an input? (e.g., exclude some of the pharmacogenes or anything of the sort) We would really love using PyPGX as an integrated PGX solution, but it will be much harder if we have to align the fastq files separately for this.
Thank you, Yaara