hepcat72 / CFF

Cluster-free Filtering. Determine which sequences are real in a metagenomic sample.
GNU General Public License v3.0
9 stars 1 forks source link

errorRates.pl bug - sorting mother IDs by size when some sizes are the same #1

Closed pgajer closed 9 years ago

pgajer commented 9 years ago

errorRates.pl Version 1.27

Created: 2/18/2014

The loops at lines 580 and 593 are inconsistent if sizes of some sequences are the same.

To ensure consistency, pass the same array of mother IDs in both loops

my @motherIDs = (sort {$abundance_hash->{$b} <=>
             $abundance_hash->{$a}}
         keys(%$seq_hash))[0..$estimate_max];

#Calculate the num of each base in the top "10" most abundant mother seqs
#For 10 (default) of the most abundant mother sequences
foreach my $mother_id (@motherIDs)
  {
#foreach base in A T G C
#   total num bases->{base} += number of bases in mother sequence
#that match base
$total_num_bases->{$mother_id} =
  getBaseCountsHash($seq_hash->{$mother_id}->[1]);
  }

#Cycle through the top abundant mother sequence IDs
#foreach top mother sequence in order of descending abundance
foreach my $mother_id (@motherIDs)
  {

....

hepcat72 commented 9 years ago

Fixed! Thanks Pawel! (The fix should be a very small speed improvement too.)