gymreklab / STRDenovoTools

Toolkit for calling and analyzing de novo STR mutations
GNU General Public License v3.0
13 stars 4 forks source link

mutsize wrongly used in qc_denovos.py #19

Open weizhu365 opened 9 months ago

weizhu365 commented 9 months ago

Dear MonSTR developers;

By definition, mutsize is "The size of the mutation (number of repeat units)." However, in https://github.com/gymreklab/STRDenovoTools/blob/master/scripts/qc_denovos.py,

mutations["unit"] = mutations["mutsize"]%mutations["period"] == 0

It excluded dn STRs where mutation size is not multiple of period. Therefore, mutsize is likely to be length of the indel rather then the number of repeat units, which is conflicted with the definition of mutsize.

In the actual MonSTR output, mutsize follows its definition. The application of qc_denovos.py with --filter-step-size will wrongly remove many dnSTRs with a unit of the STR motif.

I think this is a bug in qc_denovos.py. Please correct me if I misunderstood something.

Thanks,

Wei Zhu

gymreklab commented 9 months ago

Yes, thanks for pointing this out. Since mutsize is already the number of repeat units (not number of bp) of the mutation, this filter would not work correctly. I will comment that option and code for now since getting that option to work would require knowing the total bp size of the mutation.