mapleforest / HaploMerger2

40 stars 6 forks source link

Use of uninitialized value in numeric gt (>) at ../bin/HM_pathFinder.pl line 2058. #20

Open fbemm opened 6 years ago

fbemm commented 6 years ago

Hey,

I am running HM2 on a large genome. A Canu assembly is merged succesfully while the Falcon one seems to be stuck in a loop. The last error repeats itself, memory is filling ab the the log file is seeral Gb after a day.

Any idea?

Thanks, Felix

../bin/HM_pathFinder.pl --Species genome genomex --Force --Delete --scoreScheme=score --filter=200000 --NsLCsFilter=90 --noSelfLoop=1 --noStrandConflict=1 --breakingMode=2 --misjoin_aliFilter=5000000 --misjoin_overhangFilter=50000 --escapeFilter=100

Species included:  genome genomex
Set the scoring scheme to score, and set the filter score/ali_len to 200000.
Set the breakingMode to 2,
and set the misjoin_aliFilter to 5000000,
and set the misjoin_overhangFilter to 50000.
--noSelfLoop is set to  1 !
--noStrandConflict is set to 1 !
Set to OVER-WRITING mode!
Finished reading the original data files.
Deleted 0 self-self nontrivial nodes (-17).
Deleted 2014 low-scored nodes (-2) (minimum score/ali_len =  200000 ).
0 nodes escape low-score filtering (threshold = 100 %).
Deleted 364 trivial nodes (stringent) (--minLen*4 =  5 ) (=-11).
Deleted 280 nodes mainly consisting of Ns and lowcases (-3) (>90%).
Finished seeking self_loops in perfect_path_finding (52, -6)
and chaining adjacent nodes (8) from same target query scaffolds!
Also delete 50 nodes (-5) with strand conflict with the mainstream.
Break 424 assembly mis-joining points (-8).
Deleted 678 trivial nodes (less stringent) (--minLen*2 =  2.5 ) (=-11).
Total self_loops are 52 (-6).
Total strand switchs are 50 nodes (-5).
Deleted 0 loops (=node count, -7).
Deleted 0 switch loops (=node count, -8).
Deleted totally 944 trivial nodes (relax) (--minLen*1 =  1.25 ) (<=-11).
Nodes with no adjacent nodes: 4086 .
Nodes with 1  adjacent nodes: 3498 .
Nodes with 2  adjacent nodes: 2002 .
Nodes with 3  adjacent nodes: 238 .
Nodes with 4  adjacent nodes: 0 .
There are  9832  informative nodes left .
Break 12936 branches (including mirror paths).
There are  3356  paths (with mirror) !
There are  1678  paths (without mirror) !
Finished path_terminal_extending!
Use of uninitialized value in numeric gt (>) at ../bin/HM_pathFinder.pl line 2058.
Use of uninitialized value in numeric ge (>=) at ../bin/HM_pathFinder.pl line 2058.
Use of uninitialized value in addition (+) at ../bin/HM_pathFinder.pl line 2064.
Use of uninitialized value in addition (+) at ../bin/HM_pathFinder.pl line 2064.
Use of uninitialized value in numeric lt (<) at ../bin/HM_pathFinder.pl line 2188.
Use of uninitialized value in numeric lt (<) at ../bin/HM_pathFinder.pl line 2188.
Use of uninitialized value in array element at ../bin/HM_pathFinder.pl line 2207.
fbemm commented 6 years ago

Errors happens here:

                        ### look for nend, nlen                                                                                                                                                                                                                       
                        for(my $j=$nmid+$nstrand;;$j+=$nstrand){
                            $nend=$j if $nportions->[$j][4]>-1 and $nportions->[$j][5]>=0;
                            print "ERROR\t",$nend,"\t",$nmid,"\t",$nstrand,"\t",$j,"\t",$nportions->[$j][4],"\t",$nportions->[$j][5],"\n";
                            if($j==0 or $j==scalar(@$nportions)-1 or $nportions->[$j][6]<-5){
                                if($j==$new_scaffolds[$i]->[8] and $nsc==$new_scaffolds[$i]->[6] and $ntq == $new_scaffolds[$i]->[13]){ # the last portion                                                                                                            
                                    $is_the_last_portion=1;
                                    $nend=$j;
                                }
                                $nlen = $nstrand>0 ? $nportions->[$nend][2]+$nportions->[$nend][3]-$nportions->[$nstart][2] :
                                    $nportions->[$nstart][2]+$nportions->[$nstart][3]-$nportions->[$nend][2];
                                last;
                            }
                        }

Last lines of the bottom output is causing the error:

nend    nmid    nstrand j   $nportions->[$j][4] $nportions->[$j][5]
1       1       1       2       -1      0
1       1       1       3       -1      0
1       1       1       4       -1      0
5       1       1       5       28      2
5       1       1       6       -1      0
25      2       1       25
mapleforest commented 5 years ago

It is not easy to see the problem with these information (more info needed, like genome size, seq naming and sequence number). Possibilities:

  1. run example1,2 before each project to make sure the environment of HM2 remains healthy.
  2. It is possible the sequence format or sequence names of the falcon assembly interfere (e.g., trace of Ns at the end of a seq is not acceptable).
  3. Which stage are you run? batchA or batchB? is the previous step create correct outputs? it is possible the previous step do not create complete data and then cause the next step work on illegal input.

Looking forward to your feedback.