itsmeludo / PhylOligo

Bioinformatics / Explore oligonucleotide composition similarity between assembly contigs or scaffolds to detect contaminant DNA.
GNU General Public License v3.0
10 stars 2 forks source link

contalocate.R error: missing value where TRUE/FALSE needed #11

Closed ckeeling closed 5 years ago

ckeeling commented 5 years ago

Hello,

I'm getting this error when not using manual setting of thresholds option:

Error in while (steep[i + 1] < steep[i]) { : 
  missing value where TRUE/FALSE needed
Execution halted

It looks like this is happening at line 137 (because it never opens the host_threshold_name file):

while(steep[i+1]<steep[i]){i=i+1}

I think that this error is suggesting that the final value in steep is the minimum value (the first value of des_conta[["y"]]), and thus there is no steep[i+1] possible. i.e. there is no valley of values in steep, only diminishing values. What could be causing this? Must there be a valley, or can the first value of des_conta[["y"]] be used in the code that follows, if the code was written to permit this?

Thanks for the help, Chris

itsmeludo commented 5 years ago

Hi Chris! Thanks for the comment I remember that I thought it might be a problem with the auto setting of the threshold especially if there was several species in the mix, so I made a manual setting option. (-m) you should then be shown the distance distribution and prompted iteratively to input a threshold value( as long as you give a different value, it will redraw the distribution with the given value. Enter the same value a second time and it will be validated. There are 2 thresholds (one for the host and one for the conta) especially designed to separate species and handle the overlapping of distributions from multiple species. Set the thresholds to values that fall between the modes of the distributions, the exact value is not so relevant since the 2 thresholds system will damper approximations. There should be more description in the manual and on the figure of the paper in bioinformatics. By curiosity, I'd like to see the distance distributions for your case ^^

ckeeling commented 5 years ago

Thanks @itsmeludo I have been avoiding the interactive methods until now because I am running it via a Singularity image on a server via ssh, and the X11 isn't cooperating with me when running an interactive SLURM job. However, when I load the dist files produced by contalocate.R on the server into R on my local computer, I can get it to work interactively with a few edits to simplify the contalocate.R code. Thanks!