katholt / srst2

Short Read Sequence Typing for Bacterial Pathogens
Other
123 stars 65 forks source link

RuntimeWarning: floating point number trucated to an integer #69

Closed alantsangmb closed 3 years ago

alantsangmb commented 7 years ago

I am using srst2 version 0.2.0 And srst2 can execute and produce result files using the example files. However I see there is a warning message when the srst2 analysis is done:

/usr/lib/python2.7/dist-packages/scipy/stats/distributions.py:7197: RuntimeWarning: floating point number truncated to an integer vals = special.dbtr(k,n,p)

What are the potential reason for this warning? Are the results still reliable?

Thank you in advance for any help.

rrwick commented 7 years ago

Thanks for spotting this one. I can't replicate it, so it'd be great if I could try it with your data. I know sharing large files can be a pain, though. Do you have a Dropbox account? I could make and share a Dropbox folder with you. Let me know - thanks!

Ryan

alantsangmb commented 7 years ago

I used the example data set: ERR028690 wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR028/ERR028690/ERR028690_1.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR028/ERR028690/ERR028690_2.fastq.gz

I ran srst2 as follow: srst2 --input_pe '/home/manager/ERR028690_1.fastq.gz' '/home/manager/ERR028690_2.fastq.gz' --output ERR028690_0.1.18 --log --save_scores --mlst_db '/home/manager/Escherichia_coli#1.fasta' --mlst_definitions '/home/manager/ecoli.txt' --gene_db '/home/manager/srst2-master/data/ARGannot.r1.fasta'

And here is the output: Attempting to read 7 loci from ST database /home/manager/ecoli.txt Read ST database /home/manager/ecoli.txt successfully

1497909 reads; of these: 1497909 (100.00%) were paired; of these: 1497360 (99.96%) aligned concordantly 0 times 3 (0.00%) aligned concordantly exactly 1 time 546 (0.04%) aligned concordantly >1 times

1497360 pairs aligned concordantly 0 times; of these:
  0 (0.00%) aligned discordantly 1 time
----
1497360 pairs aligned 0 times concordantly or discordantly; of these:
  2994720 mates make up the pairs; of these:
    2994015 (99.98%) aligned 0 times
    3 (0.00%) aligned exactly 1 time
    702 (0.02%) aligned >1 times

0.06% overall alignment rate [samopen] SAM header is present: 3800 sequences. [mpileup] 1 samples in 1 input files

Set max per-file depth to 8000 /usr/lib/python2.7/dist-packages/scipy/stats/distributions.py:7197: RuntimeWarning: floating point number truncated to an integer vals = special.bdtr(k,n,p) 1497909 reads; of these: 1497909 (100.00%) were paired; of these: 1495680 (99.85%) aligned concordantly 0 times 2018 (0.13%) aligned concordantly exactly 1 time 211 (0.01%) aligned concordantly >1 times ---- 1495680 pairs aligned concordantly 0 times; of these: 141 (0.01%) aligned discordantly 1 time ---- 1495539 pairs aligned 0 times concordantly or discordantly; of these: 2991078 mates make up the pairs; of these: 2990191 (99.97%) aligned 0 times 769 (0.03%) aligned exactly 1 time 118 (0.00%) aligned >1 times 0.19% overall alignment rate [samopen] SAM header is present: 1654 sequences. [mpileup] 1 samples in 1 input files Set max per-file depth to 8000 I am using python 2.7.6, scipy v0.13.3, and numpy v1.8.2, bowtie2 v 2.2.4. And I have used samtools v0.1.19 (biolinux default samtools) and samtools v0.1.18 by setting the environment variable, both with this runtimewarning manager@bl8vbox[manager] samtools [ 4:02PM] Program: samtools (Tools for alignments in the SAM format) Version: 0.1.19-96b5f2294a manager@bl8vbox[manager] samtools [ 4:01PM] Program: samtools (Tools for alignments in the SAM format) Version: 0.1.18 (r982:295) Between, how do I confirm srst2 is using samtools v0.1.18? Seems it cannot be traced in the log file. Many thanks for your help.
rrwick commented 7 years ago

Thanks for the details - I'll do my best to replicate the issue and get back to you.

Regarding SAMtools, SRST2 will by default use whatever's first in your path. So if you just run samtools --version on the same computer as SRST2, that will tell you the version. Alternatively, you can set the environment variable SRST2_SAMTOOLS if you want to specify an exact SAMtools location - useful if you have multiple SAMtools versions installed. For example: export SRST2_SAMTOOLS="/usr/local/bin/samtools" before running SRST2.

And I should mention that you're not absolutely required to use v0.1.18 - SRST2 will work with later SAMtools as well. But in our tests the results can be more accurate with v0.1.18, which is why we still recommend that version.

cytang19 commented 6 years ago

Can I know if this issue is resolved? Because I am also having the same warning using version 0.2.0. Thanks.

anou85 commented 6 years ago

same here... Would be great to get an update. Thanks

cytang19 commented 6 years ago

Hi, any update here? Is the warning affecting the result or can be ignored? Thanks.

GitTorres commented 5 years ago

Ditto with the previous questions.

nalbright commented 4 years ago

Hi there, I am having the same warning pop up. when I run with the the biocontainer srst2_0.2.0--py27_2. The onscreen message hangs for a while and then my job is killed with only part on the results generated. the last line in the log file is "Printing verbose gene detection results to . No error message in the log or note that the job finished, the run just hung for >12 hours then just stopped.

Any update on the resolution with this issue?

scwatts commented 3 years ago

Hi all, I've identified the source of this warning and it seems to be a minor bug with little impact on the final result.

The warning is triggered when passing a float to the scipy binominal cdf function, which scipy truncates to an integer internally. This float value is now rounded to the nearest integer prior to the binominal cdf call.