homeveg / nuctools

software for analysis of chromatin feature occupancy profiles from high-throughput sequencing data
GNU General Public License v3.0
15 stars 3 forks source link

Error in extend_PE #5

Closed nlapalu closed 7 years ago

nlapalu commented 7 years ago

Hi Yevhen,

I found several bugs in your script. For me with real paired reads, your code is not usable in a generic way. The main problem is the way you compute the nucleosome length. You do not take into account the strand of each mate when you do your substraction, that is the reason why you get negative values. In the same way, I think that set a max length to 1000 should be an optional parameter. In my case, I force this parameter during the mapping (not output mapping with insert size > 400 bp = di-nucleosomes). I get also a bug with the && ($lines[$#lines] =~ /^chr.*/ ))=> shift when reading bed. You cannot compare directly the read name 1 vs 2 (\1 or \2 at the end). So find my working code below:

use List::Util qw(min max);
.
.
.
if(($i==$end_index) && ($end_index % 2 == 0))
            { $last_line= $lines[$#lines]; last; }
        $line1=$lines[$i]; chomp($line1);
        $line2=$lines[$i+1]; chomp($line2);

        my @newline1=split(/\t/, $line1);
        my @newline2=split(/\t/, $line2);

        my $chr_name_1=$newline1[0];
        my $chr_name_2=$newline2[0];

        my $min = min ($newline1[1],$newline2[1]);
        my $max = max ($newline1[2],$newline2[2]);
        my $nuc_length = $max - $min;

        print $OUT_FHs join("\t", $chr_name_1, $min, $max, $nuc_length), "\n";

I hope it helps. I am new in nucleosome analysis and I prefer code in Python. I will keep you in touch if I find other problems. Thanks for the package. I am trying several tools to analyze my data.

homeveg commented 7 years ago

Hi Nicolas,

thank you for your contribution! It's always very useful to have a real-life-user on board :) Me and my colleagues as a developers normally avoiding most of bugs just because we know exactly how it suppose to work. As you might realized, the package was designed initially for single-end reads and therefore might contain some more bugs related to this issue. Regarding max length parameter, you are absolutely right. I was changing this code several time to match our old release and might have miss this point.

Thanks again for your comment.

homeveg commented 7 years ago

issues