thegenemyers / DAZZ_DB

The Dazzler Data Base
Other
35 stars 33 forks source link

issues with large -s not reported in log files #16

Closed aclum closed 8 years ago

aclum commented 8 years ago

Here the log file from the creation of the database doesn't throw an error but isn't successful and throws an error downstream. Ideally this error be reported when the database is being created. aclum@genepool14:/projectb/scratch/aclum/falcon/AWSBW/s1000/0-rawreads$ cat raw_reads.db files = 11 301768 pbio-1082.9872-nocontrol m160121_000223_00123_c100912372550000001823199404301605_s1_p0 624094 pbio-1084.9888-nocontrol m160123_022914_00123_c100902972550000001823200604301645_s1_p0 950618 pbio-1084.9889-nocontrol m160123_064729_00123_c100902972550000001823200604301646_s1_p0 1289306 pbio-1084.9890-nocontrol m160123_110636_00123_c100902972550000001823200604301647_s1_p0 1710388 pbio-1104.10055-nocontrol m160225_020933_00123_c100930082550000001823209005251654_s1_p0 2106027 pbio-1107.10076-nocontrol m160226_074755_00123_c100929792550000001823209005251671_s1_p0 2524588 pbio-1107.10077-nocontrol m160226_120708_00123_c100929792550000001823209005251672_s1_p0 2935732 pbio-1107.10078-nocontrol m160226_162622_00123_c100929792550000001823209005251673_s1_p0 3349957 pbio-1107.10079-nocontrol m160226_204534_00123_c100929792550000001823209005251674_s1_p0 3760263 pbio-1107.10080-nocontrol m160227_010858_00123_c100929792550000001823209005251675_s1_p0 4082605 pbio-1107.10081-nocontrol m160227_052813_00123_c100929792550000001823209005251676_s1_p0 blocks = 8 size = 1000000000 cutoff = 500 all = 0 0 0 558042 159986 1054378 310087 1563525 456980 2141886 614052 2792390 774164 3426911 936139 4006876 1095519 4082605 1121874

aclum@genepool14:/projectb/scratch/aclum/falcon/AWSBW/s1000/0-rawreads$ DBstats raw_reads DBstats: Stub file (.db) of raw_reads is junk

aclum@mc1732:/projectb/scratch/aclum/falcon/AWSBW/s1000/sge_log$ more prepare_rdb.sh-task_build_rdb-task_build_rdb.o20863516 cd /global/projectb/scratch/aclum/falcon/AWSBW/s1000/0-rawreads

real 2m0.948s user 1m36.668s sys 0m22.550s touch /global/projectb/scratch/aclum/falcon/AWSBW/s1000/0-rawreads/rdb_build_done

thegenemyers commented 8 years ago

There is actually nothing wrong with the DB creation. The problem is that the code did not anticipate a block size of 1Gbp. But it should be permitted.

Could you please do an experiment for me? Could you change line 268 of DB.h (in all copies in all modules) from:

define DB_PARAMS "size = %9lld cutoff = %9d all = %1d\n" // block

size, len cutoff, all in well

to

define DB_PARAMS "size = %10lld cutoff = %9d all = %1d\n" // block

size, len cutoff, all in well

and then tell me if this fixes the problem? (the 9 becomes a 10)

Thank you,  Gene

On 4/13/16, 9:29 PM, aclum wrote:

Here the log file from the creation of the database doesn't throw an error but isn't successful and throws an error downstream. Ideally this error be reported when the database is being created. aclum@genepool14:/projectb/scratch/aclum/falcon/AWSBW/s1000/0-rawreads$ cat raw_reads.db files = 11 301768 pbio-1082.9872-nocontrol m160121_000223_00123_c100912372550000001823199404301605_s1_p0 624094 pbio-1084.9888-nocontrol m160123_022914_00123_c100902972550000001823200604301645_s1_p0 950618 pbio-1084.9889-nocontrol m160123_064729_00123_c100902972550000001823200604301646_s1_p0 1289306 pbio-1084.9890-nocontrol m160123_110636_00123_c100902972550000001823200604301647_s1_p0 1710388 pbio-1104.10055-nocontrol m160225_020933_00123_c100930082550000001823209005251654_s1_p0 2106027 pbio-1107.10076-nocontrol m160226_074755_00123_c100929792550000001823209005251671_s1_p0 2524588 pbio-1107.10077-nocontrol m160226_120708_00123_c100929792550000001823209005251672_s1_p0 2935732 pbio-1107.10078-nocontrol m160226_162622_00123_c100929792550000001823209005251673_s1_p0 3349957 pbio-1107.10079-nocontrol m160226_204534_00123_c100929792550000001823209005251674_s1_p0 3760263 pbio-1107.10080-nocontrol m160227_010858_00123_c100929792550000001823209005251675_s1_p0 4082605 pbio-1107.10081-nocontrol m160227_052813_00123_c100929792550000001823209005251676_s1_p0 blocks = 8 size = 1000000000 cutoff = 500 all = 0 0 0 558042 159986 1054378 310087 1563525 456980 2141886 614052 2792390 774164 3426911 936139 4006876 1095519 4082605 1121874

aclum@genepool14:/projectb/scratch/aclum/falcon/AWSBW/s1000/0-rawreads$ DBstats raw_reads DBstats: Stub file (.db) of raw_reads is junk

aclum@mc1732:/projectb/scratch/aclum/falcon/AWSBW/s1000/sge_log$ more prepare_rdb.sh-task_build_rdb-task_build_rdb.o20863516 cd /global/projectb/scratch/aclum/falcon/AWSBW/s1000/0-rawreads

  • cd /global/projectb/scratch/aclum/falcon/AWSBW/s1000/0-rawreads trap 'touch /global/projectb/scratch/aclum/falcon/AWSBW/s1000/0-rawreads/rdb_build_done.exit' EXIT
  • trap 'touch /global/projectb/scratch/aclum/falcon/AWSBW/s1000/0-rawreads/rdb_build_done.exit' EXIT ls -il prepare_rdb.sub.sh
  • ls -il prepare_rdb.sub.sh 162551083 -rwxrwxr-x 1 aclum aclum 349 Apr 12 10:25 prepare_rdb.sub.sh hostname
  • hostname mc0214 ls -il prepare_rdb.sub.sh
  • ls -il prepare_rdb.sub.sh 162551083 -rwxrwxr-x 1 aclum aclum 349 Apr 12 10:25 prepare_rdb.sub.sh time /bin/bash ./prepare_rdb.sub.sh
  • /bin/bash ./prepare_rdb.sub.sh fasta2DB -v raw_reads -f/global/projectb/scratch/aclum/falcon/AWSBW/s1000/0-rawreads/input.fofn
  • fasta2DB -v raw_reads -f/global/projectb/scratch/aclum/falcon/AWSBW/s1000/0-rawreads/input.fofn Adding 'pbio-1082.9872-nocontrol' ... Adding 'pbio-1084.9888-nocontrol' ... Adding 'pbio-1084.9889-nocontrol' ... Adding 'pbio-1084.9890-nocontrol' ... Adding 'pbio-1104.10055-nocontrol' ... Adding 'pbio-1107.10076-nocontrol' ... Adding 'pbio-1107.10077-nocontrol' ... Adding 'pbio-1107.10078-nocontrol' ... Adding 'pbio-1107.10079-nocontrol' ... Adding 'pbio-1107.10080-nocontrol' ... Adding 'pbio-1107.10081-nocontrol' ... DBsplit -x500 -s1000 raw_reads
  • DBsplit -x500 -s1000 raw_reads LB=$(cat raw_reads.db | awk '$1 == "blocks" {print $3}') cat raw_reads.db | awk '$1 == "blocks" {print $3}') cat raw_reads.db | awk '$1 == "blocks" {print $3}' ++ cat raw_reads.db ++ awk '$1 == "blocks" {print $3}'
  • LB=8 HPCdaligner -v -dal4 -t8 -e.70 -l1000 -s1000 -H1000 raw_reads 1-$LB >| /global/projectb/scratch/aclu m/falcon/AWSBW/s1000/0-rawreads/run_jobs.sh
  • HPCdaligner -v -dal4 -t8 -e.70 -l1000 -s1000 -H1000 raw_reads 1-8

real 2m0.948s user 1m36.668s sys 0m22.550s touch /global/projectb/scratch/aclum/falcon/AWSBW/s1000/0-rawreads/rdb_build_done

  • touch /global/projectb/scratch/aclum/falcon/AWSBW/s1000/0-rawreads/rdb_build_done touch /global/projectb/scratch/aclum/falcon/AWSBW/s1000/0-rawreads/rdb_build_done.exit
  • touch /global/projectb/scratch/aclum/falcon/AWSBW/s1000/0-rawreads/rdb_build_done.exit

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DAZZ_DB/issues/16

thegenemyers commented 8 years ago

Since I've not heard back from you. I'll assume this was the correct fix and incorporate it into the next commit.