Working with Large BAM files

sbslee / pypgx

A Python package for pharmacogenomics (PGx) research

https://pypgx.readthedocs.io

MIT License

66 stars 13 forks source link

Working with Large BAM files #141

Closed abheda24 closed 1 month ago

abheda24 commented 1 month ago

I am trying to test pypgx on 1000 genomes database, the CRAM file is around 13 GB and after converting to BAM is around 34 GB. I am trying to use create-input-vcf to generate a vcf file the files are sorted and indexed. I dont know what the issue is but i was only able to generate a file which was just 370K which is very low and the process was completed in 12 secs. I installed via pip and also downloaded the pypgx bundle, Could you please guide me?

sbslee commented 1 month ago

Hi @abheda24,

Is the CRAM file you used a high-coverage (e.g., 30x) WGS sample?
When running the CLI, did you make sure that the genome build is correct (GRCh37 vs. GRCh38)?
What's the PyPGx version are you using?
I recently published a paper where I applied PyPGx to the entire 2,504 samples from 1KGP.

abheda24 commented 1 month ago

1.Yes, Its an high coverage WGS sample (30x) from the 1000 genomes database around 13 GB. 2.Yes, i used GRCh38 which is correct. 3.The version i am using is 0.25.0 4.I will check the implementation, thanks .

sbslee commented 1 month ago

Can you share the exact CLI you used for creating the VCF and also the exact terminal output?

abheda24 commented 1 month ago

pypgx create-input-vcf \ ~/pgx_pipeline/input/NA06991-variants.vcf.gz \ ~/pgx_pipeline/input/GRCh38_full_analysis_set_plus_decoy_hla.fa \ ~/pgx_pipeline/input/NA06991.cram \ --assembly GRCh38

The output returned the files with 369KB and process was completed in 0.12 minutes

sbslee commented 1 month ago

Could you send me the output VCF?

abheda24 commented 1 month ago

NA06991-variants.vcf.gz

sbslee commented 1 month ago

Thanks. The VCF file looks fine to me. Have you tried running PyPGx on it?

abheda24 commented 1 month ago

I will run the Ngs pipeline and let you know, you can close the issue. Thanks for your response