priyamayur / GenomicIslandPrediction

MIT License
1 stars 2 forks source link

the start and end positions of the GIs are rounded up! #5

Open kraken-IX opened 1 year ago

kraken-IX commented 1 year ago

I tried predicting the genomic islands of bacterial genomes but somehow all the start and end positions of the treasureisland are rounded up. something like this:

image

is some cases the end position of the GI was bigger than the contig end . ex: contig length = 24668, predicted island end = 25 000

any idea how to fix this?

can these parameters be modified without impacting the tools specificity?

self.WINDOW_SIZE = 10000 self.TUNE_METRIC = 1000 self.MINIMUM_GI_SIZE = 10000

priyamayur commented 1 year ago

Thank you for bringing this to my attention.

  1. can you share the accession of the genome you are testing, so I can test it on my end and look at the issue. It looks like rounded values because the self.TUNE_METRIC = 1000, so the model is incrementally taking the values in multiplications of 1000. Finding exact values at the border is a difficult task with a GEI prediction using an unannotated genome.
  2. ex: contig length = 24668 and self.WINDOW_SIZE = 10000: this model was tested on whole genomes, and hence the parameters were set to these values with respect to a whole genome. We have not yet tested changing the parameters with shorter sequences.
  3. The model has not currently enabled the tuning of those parameters, but this can be enabled in the next version of the model.

Update:

I tested it on NZ_CP025210.1 of length 1960. I do not get end(bp) to be more than the length. It could be specific to a sequence.

image