google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.19k stars 721 forks source link

question about pileup image height #893

Open sophienguyen01 opened 1 day ago

sophienguyen01 commented 1 day ago

Hello,

When I trained DeepVariant, I set pileup_image_height=75. My question is, when I run DeepVariant to evaluate with a trained model, do I have to add this parameter --make_examples_extra_args="pileup_image_height=75" . I tested running with and without parameter and I noticed I have better precision and accuracy not using this parameter. However, I have 'call_variants.py:623] Input shape [100, 221, 7] and model shape [75, 221, 7] does not match.' in my log file when I do not add this parameter --make_examples_extra_args="pileup_image_height=75".

Which way is the correct way to do?

Thank you

akolesnikov commented 1 day ago

Hi @sophienguyen01,

The correct way would be to add the parameter during the inference. That way examples are created with the correct height. The difference in accuracy can be explained by the fact that reads are downsampled to the pileup_image_height max coverage. There are probably locations where the coverage is larger than 75 and more reads get into the image. Although, model does not use those extra reads but since reads are sorted by position we may have more support in certain cases.

sophienguyen01 commented 1 day ago

Thank you,

Because adding pileup_image_height is necessary using the trained model, do I need to add the same additional parameters when running DeepVariant for evaluation as when I do during creating examples for training DeepVariant (such as --min_base_quality, --min_mapping_quality...)?

akolesnikov commented 1 day ago

Yes, the best practice is to use the same parameters for evaluation.

sophienguyen01 commented 1 day ago

Thank you