shishenyxx / DeepMosaic

DeepMosaic is a deep-learning-based mosaic single nucleotide classification tool without the need of matched control information.
https://www.nature.com/articles/s41587-022-01559-w
Other
42 stars 5 forks source link

If input VCFs are splitted by chromosomes, should input BAM files also be splitted by chromosomes too? #7

Closed ehojune closed 1 year ago

ehojune commented 1 year ago

Dear shishenyxx,

Thank you for your amazing program that applies image classification technology into bioinformatics/genomics field. I want to use your program, and I got one question: If input VCFs are splitted by chromosomes, should input BAM files also be splitted by chromosomes too? I have many WGS samples that I would like to process with DeepMosaic.

If you have any comment, please answer.

Thanks.

Sincerely, June

shishenyxx commented 1 year ago

Dear shishenyxx,

Thank you for your amazing program that applies image classification technology into bioinformatics/genomics field. I want to use your program, and I got one question: If input VCFs are splitted by chromosomes, should input BAM files also be splitted by chromosomes too? I have many WGS samples that I would like to process with DeepMosaic.

If you have any comment, please answer.

Thanks.

Sincerely, June

Hi June,

Thank you for reaching out! There is no need to split the bam by chromosomes. Splitting VCFs by chromosome is increasing the speed already. If you want to further split your vcf, you can use GATK SplitIntervals (https://gatk.broadinstitute.org/hc/en-us/articles/360036899592-SplitIntervals) to split your input into up to thousands of equal-sized intervals.

Hope that helps you, and thank you again for using DeepMosaic!

Best,

Xiaoxu