shishenyxx / DeepMosaic

DeepMosaic is a deep-learning-based mosaic single nucleotide classification tool without the need of matched control information.
https://www.nature.com/articles/s41587-022-01559-w
Other
41 stars 5 forks source link

Using DeepMosaic for diploid plants #19

Closed H-wong017 closed 6 months ago

H-wong017 commented 11 months ago

Hi, Dr. Yang, I'm interested in using DeepMosaic in diploid plant species. Any guidance you could provide on using DeepMosaic for diploid plant, especially for species without sex chromosomes, would be much appreciated. Thank you for your time and for creating this useful tool!

shishenyxx commented 10 months ago

Hi H-wong017,

Thank you for using DeepMosaic. As I personally haven't directly worked with plant genomes I can‘t guarantee the performance, and as the model is trained with 300 bp window it might be challenging to use the tool as is for the detection of somatic mutations in repetitive sequences.

This being said, I would suggest

  1. Download the program from Github with git-lfs instead of using the singularity container
  2. Generate alternative files in the resources folder (genome FASTA file with indexes, all_repeats BED file, seq.h5 HDF5 file, known segdup, population SNP data if you have a snp array allele frequency for your plant) please refer to https://github.com/gmcvicker/genome regarding the HDF5 file.
  3. You might have to change some of your file names and pretend it to be an hg37 or 38 to run through some of the processes. Especially for ANNOVAR where we don't have many options provided and basically implemented the tools.
  4. You can use sex 'F' for your plants if all chromosomes are considered diploid.
  5. You can first try step 1 and see what the candidate looks like.
  6. The input bam file should come from BWA alignment and GATK processing pipeline with base recalibration and indel realignment, HaplotypeCaller. (should work according to https://www.mdpi.com/2223-7747/9/4/439 and https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03704-1)
  7. You can use MuTect2 single mode (panel of normal, see tutorial) to get pilot input VCF.

Good luck and thank you for trying DeepMosaic!

Xiaoxu

H-wong017 commented 10 months ago

Hi,Dr. Yang I am truly grateful for your suggestions. I followed the steps provided in the repository's README file, running the following commands after git-lfs installed:

git lfs install
git clone --recursive https://github.com/Virginiaxu/DeepMosaic

during the cloning process, I encountered the following error message when the download reached approximately 200MB:

Downloading deepmosaic/models/densenet121_epoch_11.pt (28 MB)
Error downloading object: deepmosaic/models/densenet121_epoch_11.pt (025c8da): Smudge error: Error downloading deepmosaic/models/densenet121_epoch_11.pt (025c8da6340035d4df7dbb5d4e033ec2a5e2c56859d747fdf90e06e9d43e02bb): batch response: LFS only supported repository in paid or trial enterprise.

I attempted to troubleshoot the issue by referring to the GitHub issue tracker (https://github.com/git-lfs/git-lfs/issues/911 ) and exploring other possible solutions, but unfortunately, I was unable to resolve the problem.

Could you please help me to address this error? Thank you in advance for your time and assistance.

shishenyxx commented 10 months ago

Hi,Dr. Yang I am truly grateful for your suggestions. I followed the steps provided in the repository's README file, running the following commands after git-lfs installed:

git lfs install
git clone --recursive https://github.com/Virginiaxu/DeepMosaic

during the cloning process, I encountered the following error message when the download reached approximately 200MB:

Downloading deepmosaic/models/densenet121_epoch_11.pt (28 MB)
Error downloading object: deepmosaic/models/densenet121_epoch_11.pt (025c8da): Smudge error: Error downloading deepmosaic/models/densenet121_epoch_11.pt (025c8da6340035d4df7dbb5d4e033ec2a5e2c56859d747fdf90e06e9d43e02bb): batch response: LFS only supported repository in paid or trial enterprise.

I attempted to troubleshoot the issue by referring to the GitHub issue tracker (git-lfs/git-lfs#911 ) and exploring other possible solutions, but unfortunately, I was unable to resolve the problem.

Could you please help me to address this error? Thank you in advance for your time and assistance.

Just paid for git-lfs, please try again.

H-wong017 commented 10 months ago

Thanks a lot,git lfs works now.

H-wong017 commented 10 months ago

Hi,Dr. Yang, I am considering to use RepeatMasker for genome repeat annotation in order to generate the all_repeats BED file. Similarly, lastz for annotating segmental duplications in genome to create the segmental duplication BED file. Do you think this approach is suitable, or do you have any alternative suggestions? Thank you in advance for your time and assistance.

shishenyxx commented 10 months ago

Hi,Dr. Yang, I am considering to use RepeatMasker for genome repeat annotation in order to generate the all_repeats BED file. Similarly, lastz for annotating segmental duplications in genome to create the segmental duplication BED file. Do you think this approach is suitable, or do you have any alternative suggestions? Thank you in advance for your time and assistance.

Yeah, I think that sounds like a good way to proceed.