yzhernand / VNTRseek

This repository is now deprecated. Please visit the official repository at https://github.com/Benson-Genomics-Lab/VNTRseek. VNTRSeek is a computational pipeline for the detection of VNTRs. It was developed by Yevgeniy Gelfand et al in Dr. Gary Benson's Laboratory for Biocomputing and Informatics at Boston University.
http://orca.bu.edu/vntrseek/
GNU General Public License v3.0
1 stars 2 forks source link

DBD::SQLite::db do failed: near "(": syntax error #12

Closed griznog closed 5 years ago

griznog commented 5 years ago

Hi Yozen,

Our test run with 1.10.0-rc.3 died with this error:

29284921 profiles read, 8437576 profiles marked nonredundant. (time: 8097 seconds)

setting additional statistics...
Creating reference sequence database...
DBD::SQLite::db do failed: near "(": syntax error at /scg/apps/software/vntrseek/1.10.0-rc.3/vntrseek1.10.0-rc.3/lib/vutil.pm line 706.
command exited with value 2 at /scg/apps/software/vntrseek/1.10.0-rc.3/bin/vntrseek line 585.
Done vntrseek

My perl-fu is weak and I do not see an obvious problem in vutil.pm, any idea what we are doing wrong here?

Our vs.conf looks like:

$ cat 7258930.vs.cnf
# Database backend
BACKEND=sqlite

# set this to the number of processors on your system 
# (or less if sharing the system with others or RAM is limited)
# eg, 8
NPROCESSES=128

# minimum required flank on both sides for a read TR to be considered
# eg, 10
MIN_FLANK_REQUIRED=10

# maximum flank length used in flank alignments
# set to big number to use all
# if read flanks are long with a lot of errors, 
# it might be useful to set this to something like 50
# max number of errors per flank is currently set to 8 (can be changed in main script only)
# eg, 1000
MAX_FLANK_CONSIDERED=50

# minimum number of mapped reads which agree on copy number to call an allele
# eg, 2
MIN_SUPPORT_REQUIRED=2

# Whether or not to keep reads detected as PCR duplicates. A nonzero (true) value
# means that detected PCR duplicates will not be removed. Default is 0.
KEEPPCRDUPS=1

# server name, used for html generating links
# eg, orca.bu.edu
SERVER=localhost

# for 454 platform, strip leading 'TCAG' 
# eg, 1 - yes
# eg, 0 - no (use no for all other platforms)
STRIP_454_KEYTAGS=0

# data is paired reads
# eg, 0 = no 
# eg, 1 - yes
IS_PAIRED_READS=1

# Sample ploidy. Default is 2. For haploid, set to 1.
PLOIDY=2

# Rebuild reference database
# eg, 0 = no 
# eg, 1 - yes
REDO_REFDB=0

# input data directory 
# (plain or gzipped fasta/fastq files)
# eg, /input
INPUT_DIR=/tmp/7258930/fasta

# output directory (must be writable and executable!)
# eg, /output
OUTPUT_ROOT=/home/username/output/7258930

# temp (scratch) directory (must be executable!)
# eg, /tmp
TMPDIR=/tmp/7258930

# names for the reference files 

# (leb36 file, sequence plus flank data file, indistinguishable references file) 
# files must be in install directory

# eg, hg19. This is the base name for files describing
# reference TR loci (.db, .seq, .leb36, and .indist)
REFERENCE=/tmp/7258930/reference/t26__

# generate a file of indistinguishable references, 
# necessary only if a file is not already available for the reference set
# eg, 1- generate
# eg, 0 - don't generate
REFERENCE_INDIST_PRODUCE=0
yzhernand commented 5 years ago

Thank you for the report. I'll check out what could be wrong and let you know if I need more information.

yzhernand commented 5 years ago

Hi griznog,

Could you please provide me with the output of the following commands on the system where VNTRseek will run? Thank you.

perl -v
perldoc -m DBD::SQLite | grep 'our $VERSION'

Thanks!

griznog commented 5 years ago

At the moment perl and modules come from the stock CentOS 7 packages:

$ perl -v

This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
(with 38 registered patches, see perl -V for more detail)

Copyright 1987-2012, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

$ perldoc -m DBD::SQLite | grep 'our $VERSION'
our $VERSION = '1.39';
yzhernand commented 5 years ago

This might be a bug in your version of DBD::SQLite as I can't reproduce it. The earliest known working version of that module is 1.58. Would it be possible for you to upgrade to at least that version on your system?

griznog commented 5 years ago

Upgraded to perl 5.28.0 and DBD::SQLite 1.62 and that worked to get past previous error to the one below:

$ tail -40 slurm-8085989.out
-175363581 | 29279231 

-175363581 | 29279232 

-175363581 | 29279233 

-175363581 | 29279234 

processed: 191196

Processing complete -- processed 191196 cluster(s), support entries created = 199578.

end: 2019-03-09 03:36:07

done!

STEP #18 IS EMPTY!done!

Executing step #19 (final database update)...
Updating fasta_ref_reps table...
Updating stats table...
Producing output LaTeX file...
Argument "Error: mismatch of sum of mapped read TRs by distinguish..." isn't numeric in addition (+) at updaterefs.pl line 1449.
3884573)
$VAR1 = [
          3884573,
          -777,
          3884573
        ];

command exited with value 2 at /scg/apps/software/vntrseek/1.10.0-rc.3/bin/vntrseek line 1338.
Done vntrseek
yzhernand commented 5 years ago

I ran into that bug while searching for the cause of yours. I'll let you know when the fix is pushed out.

yzhernand commented 5 years ago

Fixed in commit 3758ede

Since you already have a reference database file created, you must run the following command interactively, without specifying a run name:

vntrseek --reference /path/to/reference/basename --redo_refdb

That will reinitialize your reference database, which is needed because thanks to a bug new databases did not have a column initialized in a table needed in the last step. If you have more than one reference set, run this command for each.

Clear the error on any failed runs by specifying step 100 (vntrseek --dbsuffix run_name 100) and then run them as normal.

Please let me know if this fixes the issues for you. Thank you!

griznog commented 5 years ago

I applied that patch and started a new analysis, adding to my Slurm job script:

vntrseek --reference ${REFERENCE_DIR}/t26__ --redo_refdb || exit 1

That command finished successfully, then the run of vntrseek started. This ran most of the day and ended with the error below:

Executing step #4 (performing bipartite clustering of tandem repeats profiles)...

start: 2019-03-08 18:32:49

85  supported files found in /home/yesavage/output/8084307/vntr_8084307/data_out_clean/
Checking read leb36 files...

end: 2019-03-08 18:36:42

DBD::SQLite::db do failed: malformed database schema (idx_fasta_ref_reps_arraysize) - near "-": syntax error at /scg/apps/software/vntrseek/1.10.0-rc.3/vntrseek1.10.0-rc.3/lib/vutil.pm line 416.
Done vntrseek
yzhernand commented 5 years ago

Hm, I'm not sure what's going on here. Only one test analysis was running? And you are using updated versions of perl and DBD::SQLite?

You could try running the analysis using

DEBUG=1 vntrseek ...

which will produce potentially a lot of output, but the output near the crash might help.

griznog commented 5 years ago

Running with DEBUG=1 completed, so on the assumption I somehow broke the previous attempt, I am running again without DEBUG=1 as a sanity check.

One thing I did notice in this latest run, my command is:

vntrseek 0 19 --DBSUFFIX 8287661 \ 
              --REFERENCE /local/scratch/USERNAME/8287661/reference/t26__ \ 
              --NPROCESSES 256 \ 
              --MIN_FLANK_REQUIRED 10 \ 
              --MAX_FLANK_CONSIDERED 50 \ 
              --MIN_SUPPORT_REQUIRED 2 \ 
              --STRIP_454_KEYTAGS 0 \ 
              --IS_PAIRED_READS 1 \ 
              --HTML_DIR /local/scratch/USERNAME/8287661/www \ 
              --INPUT_DIR /local/scratch/USERNAME/8287661/fasta \ 
              --OUTPUT_ROOT /local/scratch/USERNAME/8287661/output \ 
              --TMPDIR /local/scratch/USERNAME/8287661/tmp \ 
              --REFERENCE_INDIST_PRODUCE 0 \ 
              --MIN_FLANK_REQUIRED 10 \ 
              --MAX_FLANK_CONSIDERED 50 \ 
              --MIN_SUPPORT_REQUIRED 2 \ 

but I get this warning:

Unknown option: html_dir
Run config does not exist. A new one will be created, but make sure yourdbsuffix is correct!
Warning: 'html_dir' option is unset.

In the run with DEBUG=1, that looks this:

Unknown option: html_dir
Reading configuration file: /scg/apps/software/vntrseek/1.10.0-rc.3/vntrseek1.10.0-rc.3/vs.cnf
Reading configuration file: /home/USERNAME/8282563.vs.cnf
No such file or directory at /scg/apps/software/vntrseek/1.10.0-rc.3/vntrseek1.10.0-rc.3/lib/vutil.pm line 56.

Run config does not exist. A new one will be created, but make sure yourdbsuffix is correct!
$VAR1 = {
          'DBSUFFIX' => '8282563',
          'CONF_DIR' => '/home/USERNAME',
          'REFERENCE_INDIST_PRODUCE' => 0,
          'KEEPPCRDUPS' => '1',
          'INPUT_DIR' => '/tmp/8282563/fasta',
          'NPROCESSES' => 256,
          'STRIP_454_KEYTAGS' => 0,
          'IS_PAIRED_READS' => 1,
          'SERVER' => 'localhost',
          'PLOIDY' => '2',
          'REFERENCE' => '/tmp/8282563/reference/t26__',
          'MIN_SUPPORT_REQUIRED' => 2,
          'BACKEND' => 'sqlite',
          'TMPDIR' => '/tmp/8282563/tmp',
          'OUTPUT_ROOT' => '/tmp/8282563/output',
          'REDO_REFDB' => '0',
          'MAX_FLANK_CONSIDERED' => 50,
          'MIN_FLANK_REQUIRED' => 10
        };
Warning: 'html_dir' option is unset.

Ignore the different directories there, I'm testing on different types of storage to see what performs best.

yzhernand commented 5 years ago

I've silenced the warning for the html_dir option for now, since I am considering deprecating it anyway.

The warning about the config file not existing is just there to signal that your run name (dbsuffix) might be wrong if this is not a new run. But if it is, then you can ignore it.

Did the new run finish successfully?

griznog commented 5 years ago

As far as I can tell it completed successfully. Waiting for someone who can interpret the results to verify that, but at least as far as running and completing it all looks good.

yzhernand commented 5 years ago

Great! When you feel that this specific issue has been resolved, feel free to close it.