mdmparis / defense-finder

Systematic search of all known anti-phage systems.
GNU General Public License v3.0
73 stars 12 forks source link

[BUG] miscalculation of alignment coverage #68

Open jpjarnoux opened 2 weeks ago

jpjarnoux commented 2 weeks ago

Describe the bug The value of alignment coverage is miscalculated.

To Reproduce

lcl|NZ_CALUAQ010000001.1_prot_WP_278902228.1_145 [locus_tag=QDJ88_RS00725] [protein=hypothetical protein] [protein_id=WP_278902228.1] [location=166201..167580] [gbkey=CDS] MNNHPKFEEYKNFIVNNEAYKGLPYVRKKDCSVVWVATKQSEIGKKRIEWALNKAEKFDISKNDSSFYRKVMFIVHPTKE KVCQICGESMSLDYIYPNKSFQKFLSKNYDYEPNVFDDIYDVSNHFKEEGKSNLEIIKILNKSIKLKGNFNDLTIYIKSV IEACRNGEKKSLGPGAMSNFPDRYDGFHSYNRCCRQEFDKGRNPDNMKTYSKDRRAYENWSDGNIHAANKFMSSKYFTGL SADHIGPISLGFIHDPRFMQPMSSGDNSSKRDRLEKSDIKKLIELESRYNLLPVSWFSEKIWKTLKADFLIDNNINLEYY RNLMKININNYMSLLYIIITETGDLGKKFLEENFLKTKQEYFKYDYEFDEFGNIISKTFRNISDATKKEYSRSIRISFES VIDFSEKENRNIKSSFNTNIAKNIDSLVYEINNCAPNKSIVNHFYKIVSAMQDDLLKEA lcl|NZ_CALUAQ010000001.1_prot_WP_278902230.1_146 [gene=dcm] [locus_tag=QDJ88_RS00730] [protein=DNA (cytosine-5-)-methyltransferase] [protein_id=WP_278902230.1] [location=complement(167588..170410)] [gbkey=CDS] MELYNDLTKNTESYLKTLAMDIKRQQGIYYTDLELTDRIMDKVLEKISAEEINDINFLEPCVGSGNFVISYIKNAALKFK LNSNEIRKLIKNIYVCDSDKSAIDLFIRNLKYFIKKEYKIDLTNDDLPNIGGSLIYNLQLREDNYIGIDKYFGDIKFDVI VTNPPYKGLRAERRHYKTDDEYFNGRNYYNKIKEKSKHLHPLSGALNANIYRYFVEEIYTKYSKPDSIISILVPSSILTD KSCYDLRNKIIDFNITDIIQIPENNKYIDASQAMTNIITDSNHINSSVSIEKLNNGRIENYLVPKKNIVDEAHNNSILIM SSEDYKLLNKLKSISKVKDHKFIVNMRGELDITNNKDSITSKKTEYPLIRGRDIDRYCEVKYEKIKDYATKEFVNNSSKQ RYVIKNRLACQQIVNMNKKTRIGFTLIKENTVLANSCNFIFIEDNDYGIDEYYALAILNSKYCDWYFKIFSSNNHVNNYE IDLLPFPIGNSVQIQKISSLAKEQVLEFSNLRDNQINKIVTEIIDNFFGVENKINLNSTQIDELANKGLKEKNKVLSKKG ILNDKQYTLSELDLEIIKSVPQGGNWKNIPDKTIEKSKRLMKIRETGGRTTLYGRIDYTNPSYTITTYFNRPGNGCYIHP TQDRVLTTREGARIQCFPDDYYFYGNQRDILNQIGNAVPPLMGYLIAKKIKENLNVKKSLDLFSGAGGLLYGFKMAGVEH VLANDIDRSACVTLKINNPEVNVLCDDVTNDYTKEIIIDTAIQNNVDIICGGPPCQGFSLAGFRKSDDPRNKLVLDFADI IKNVEPKVFVFENVVGLLSYNKGETFNEIKKMFLALGYKLHAETLDFSDYGVPQRRRRVIIIGVRNNINIEPSKLFPDKI TKNKKISVMETIGDLDTNINSCNMNSKFISLMKNEISYDYYLDSIKENCENEIGEQLSIF

Steps to reproduce the behavior:

  1. defense-finder run -o . -w 1 input_file

  2. The output is lcl|NZ_CALUAQ010000001.1_prot_WP_278902228.1_145 GCF_943914075.1_IASS5iDOpI_bin.1.MAG_translated_cds 143 459 RM_Type_IIType_II_REases 2.400e-11 40.200 0.5380.401 243 426 lcl|NZ_CALUAQ010000001.1_prot_WP_278902228.1_145 GCF_943914075.1_IASS5iDOpI_bin.1.MAG_translated_cds 143 459 RM_Type_IIType_II_REases 1.100e-16 57.700 0.4690.344 78 235 lcl|NZ_CALUAQ010000001.1_prot_WP_278902230.1_146 GCF_943914075.1_IASS5iDOpI_bin.1.MAG_translated_cds 144 940 DISARM_2__drmMII 8.700e-58 192.700 0.563 0.230699 914 lcl|NZ_CALUAQ010000001.1_prot_WP_278902230.1_146 GCF_943914075.1_IASS5iDOpI_bin.1.MAG_translated_cds 144 940 Druantia_IIDruM 2.800e-57 191.200 0.525 0.210699 895 lcl|NZ_CALUAQ010000001.1_prot_WP_278902230.1_146 GCF_943914075.1_IASS5iDOpI_bin.1.MAG_translated_cds 144 940 RM_Type_IIType_II_MTases 1.700e-05 20.700 0.4100.169 541 699 lcl|NZ_CALUAQ010000001.1_prot_WP_278902230.1_146 GCF_943914075.1_IASS5iDOpI_bin.1.MAG_translated_cds 144 940 RM_Type_IIType_II_MTases 1.400e-07 27.000 0.5490.251 13 248 lcl|NZ_CALUAQ010000001.1_prot_WP_278902230.1_146 GCF_943914075.1_IASS5iDOpI_bin.1.MAG_translated_cds 144 940 RM_Type_IIType_II_MTases 6.200e-08 28.100 1.0460.422 11 407 lcl|NZ_CALUAQ010000001.1_prot_WP_278902230.1_146 GCF_943914075.1_IASS5iDOpI_bin.1.MAG_translated_cds 144 940 RM_Type_IIType_II_MTases 2.900e-15 52.800 0.5050.186 697 871 lcl|NZ_CALUAQ010000001.1_prot_WP_278902230.1_146 GCF_943914075.1_IASS5iDOpI_bin.1.MAG_translated_cds 144 940 RM_Type_IIType_II_MTases 5.600e-26 87.900 0.5490.196 700 883 lcl|NZ_CALUAQ010000001.1_prot_WP_278902230.1_146 GCF_943914075.1_IASS5iDOpI_bin.1.MAG_translated_cds 144 940 RM_Type_IIType_II_MTases 7.700e-28 94.100 0.5000.187 699 874 lcl|NZ_CALUAQ010000001.1_prot_WP_278902230.1_146 GCF_943914075.1_IASS5iDOpI_bin.1.MAG_translated_cds 144 940 RM_Type_IIType_II_MTases 4.500e-46 154.100 0.5380.227 698 910 lcl|NZ_CALUAQ010000001.1_prot_WP_278902230.1_146 GCF_943914075.1_IASS5iDOpI_bin.1.MAG_translated_cds 144 940 RM_Type_IIType_II_MTases 2.900e-48 161.300 0.5980.229 699 913 lcl|NZ_CALUAQ010000001.1_prot_WP_278902230.1_146 GCF_943914075.1_IASS5iDOpI_bin.1.MAG_translated_cds 144 940 RM_Type_IIType_II_MTases 2.900e-52 174.700 0.9210.647 90 697 lcl|NZ_CALUAQ010000001.1_prot_WP_278902230.1_146 GCF_943914075.1_IASS5iDOpI_bin.1.MAG_translated_cds 144 940 RM_Type_II__Type_II_MTases 2.000e-61 204.900 0.6090.231 698 914

  3. The file I got is attached / contains ...

  4. I got the following error : ....

Expected behavior

  1. When we look at hit_profile_cov and hit_seq_cov, we have a very high and low value, so I don't know if it's express between 0 and 1 or 0 and 100.
  2. When I calculate (hit_end_match - hit_begin_match)/hit_sequence_length I did not find either hit_profile_cov or hit_seq_cov.

Maybe I musunderstand something.

Screenshots

If applicable, add screenshots to help explain your problem.

Please complete the following information :

OS:

Version of your the defenseFinder software and Models

mdmparis-defense-finder==1.2.2 macsyfinder==2.1.1 models 1.2.4