torognes / swarm

A robust and fast clustering method for amplicon-based studies
GNU Affero General Public License v3.0
121 stars 23 forks source link

code coverage for algod1.cc #143

Closed frederic-mahe closed 4 years ago

frederic-mahe commented 4 years ago

In my tests, the last condition of that function is never met (line 652):

int compare_amp(const void * a, const void * b)
{
  const unsigned int * x = static_cast<const unsigned int*>(a);
  const unsigned int * y = static_cast<const unsigned int*>(b);
  if (*x < *y)
    return -1;
  else if (*x > *y)
    return +1;
  else
    return 0;
}

Trying printf ">a_2\nA\n>b_1\nG\n>c_1\nT\n" or printf ">a_2\nA\n>b_1\nT\n>c_1\nG\n" shows that x and y are the alphabetical order of the sequences (since amplicon names and abundances are constant). Now that two amplicons cannot have the same sequence, x = y cannot happen.

torognes commented 4 years ago

It is correct that x and y will never be identical when swarm runs and that the line return 0; will never be executed under normal circumstances. However, this compare_amp function is a function provided to and called by the qsort function. It is required that this comparison function return a number less than, equal to, or larger than zero depending on the relationship between the two arguments. It seems like the only way that *x == *y is if x == y. But there is no warranty that some implementation of qsort in some strange case (with some version of the c library on some version of an os on some kind of machine) does not call this function with x and y pointing to the same identical amplicon, even if it would be foolish. Therefore I do not think that the case should be eliminated. I think we have to live with one line not being covered. The overhead should negligible.

If we had used only signed integers here we could used this simpler code instead:

int compare_amp(const void * a, const void * b)
{
  const int * x = static_cast<const int*>(a);
  const int * y = static_cast<const int*>(b);
  return *x - *y;
}
frederic-mahe commented 4 years ago

But there is no warranty that some implementation of qsort in some strange case (with some version of the c library on some version of an os on some kind of machine) does not call this function with x and y pointing to the same identical amplicon, even if it would be foolish.

OK, it make sense. I had in mind to add an assert(*x != *y), but based on your answer, it might not be a good idea. Maybe a comment would be useful for future maintainers? For example, swarm checks that all amplicon sequences are unique (strictly dereplicated input data), so distinct amplicons with the same sequence are not expected at this stage

torognes commented 4 years ago

Ok, I've added that comment.