ukhsa-collaboration / snapperdb

GNU General Public License v3.0
21 stars 5 forks source link

make_snpbd : invalid GATK command #21

Closed ScaonE closed 5 years ago

ScaonE commented 5 years ago

Dear all,

When lauching the following command : python2.7 ./run_snapperdb.py make_snpdb -c custom_salmo.txt;

It output the stderr below :

/home/scaonp01/.local/lib/python2.7/site-packages/psycopg2/init.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: http://initd.org/psycopg/docs/install.html#binary-install-from-pypi. """) Namespace(command='make_snpdb', config_file='custom_salmo.txt', fastqs=[], log_dir='/home/scaonp01/Software/SnapperDB') db_snapperdb already exists FASTQs found for AM933172 [2018-11-12 15:12:59,831] INFO: Version: N/A [2018-11-12 15:12:59,832] INFO: Initialising data matrix. [2018-11-12 15:12:59,837] INFO: Mapping data file with bwa. [2018-11-12 15:14:29,448] INFO: Creating digitised variants with gatk. [2018-11-12 15:14:30,924] WARNING: Calling variants returned non-zero exit status. [2018-11-12 15:14:30,925] WARNING: USAGE: [-h]

Available Programs: ... .... (GATK available programs are listed) (-h) .... A USER ERROR has occurred: '-T' is not a valid command.


Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

[2018-11-12 15:14:30,925] ERROR: VCF was not created. /home/scaonp01/Software/SnapperDB/reference_genomes/snpdb/AM933172.filtered.vcf not found ...

I am using snapperdb.py 1.0.5, GATK-4.0.11.0.

printenv is showing : PICARD_JAR=/home/scaonp01/Software/picard.jar GATK_JAR=/home/scaonp01/Software/GATK-4.0.11.0/gatk-package-4.0.11.0-local.jar GASTROSNAPPER_CONFPATH=/home/scaonp01/Software/SnapperDB/user_configs GASTROSNAPPER_REFPATH=/home/scaonp01/Software/SnapperDB/reference_genomes

postgresql user "user_snapperdb" is a superuser and can access postgresql database "db_snapperdb"

Any tips ? Should i work with a 3.X version of GATK ?

Edit (config file "custom_salmo.txt") :

snpdb_name db_snapperdb reference_genome AM933172 pg_uname user_snapperdb pg_pword somepassword pg_host localhost depth_cutoff 10 mq_cutoff 30 ad_cutoff 0.9 average_depth_cutoff 30 mapper bwa mapper_threads 8 variant_caller gatk variant_caller_threads 8

ScaonE commented 5 years ago

Ok, I picked a random GATK 3.X version (3.7.0) and ran the same command again : it went further but did not complete. Now it seems that I have a postgresql related issue (see below) :

Ps : Required GATK version should be specified in the README.

/home/scaonp01/.local/lib/python2.7/site-packages/psycopg2/init.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: http://initd.org/psycopg/docs/install.html#binary-install-from-pypi. """) Namespace(command='make_snpdb', config_file='custom_salmo.txt', fastqs=[], log_dir='/home/scaonp01/Software/SnapperDB') db_snapperdb already exists FASTQs found for AM933172 [2018-11-12 15:34:37,157] INFO: Version: N/A [2018-11-12 15:34:37,157] INFO: Initialising data matrix. [2018-11-12 15:34:37,162] INFO: Mapping data file with bwa. [2018-11-12 15:36:05,364] INFO: Creating digitised variants with gatk. [2018-11-12 15:37:49,841] INFO: Annotating [2018-11-12 15:39:43,494] INFO: Applying filters: ['mq_score:30', 'min_depth:10', 'ad_ratio:0.9'] Traceback (most recent call last): File "./run_snapperdb.py", line 312, in main() File "./run_snapperdb.py", line 307, in main run_command(args) File "./run_snapperdb.py", line 102, in run_command vcf_to_db(args, config_dict, vcf) File "/home/scaonp01/Software/SnapperDB/snapperdb/snpdb/init.py", line 52, in vcf_to_db snpdb.snpdb_upload(vcf,args) File "/home/scaonp01/Software/SnapperDB/snapperdb/snpdb/snpdb.py", line 463, in snpdb_upload if not self.check_duplicate(vcf, 'strains_snps'): File "/home/scaonp01/Software/SnapperDB/snapperdb/snpdb/snpdb.py", line 190, in check_duplicate dict_cursor.execute("select distinct(name) FROM %s where name = \'%s\'" % (database, vcf.sample_name)) File "/home/scaonp01/.local/lib/python2.7/site-packages/psycopg2/extras.py", line 141, in execute return super(DictCursor, self).execute(query, vars) psycopg2.ProgrammingError: relation "strains_snps" does not exist LINE 1: select distinct(name) FROM strains_snps where name = 'AM9331...

Edit (associated log_dir file) :

2018-11-12 15:34:36,892 snapperdb.make_snpdb INFO PARAMS: config = custom_salmo.txt 2018-11-12 15:34:36,914 snapperdb.fastq_to_vcf INFO Running fastq_to_vcf 2018-11-12 15:34:36,914 snapperdb.fastq_to_vcf INFO Parsing config_dict 2018-11-12 15:34:36,914 snapperdb.fastq_to_vcf INFO Defining class variables and making output files 2018-11-12 15:34:36,915 snapperdb.fastq_to_vcf INFO Making FASTQs 2018-11-12 15:34:36,915 snapperdb.fastq_to_vcf INFO Running Pheonix 2018-11-12 15:42:03,196 snapperdb.snpdb.vcf_to_db INFO Initialising SNPdb class 2018-11-12 15:42:03,196 snapperdb.snpdb.vcf_to_db INFO Parsing config dict 2018-11-12 15:42:03,220 snapperdb.snpdb.vcf_to_db INFO You are running vcf_to_db. Initialising Vcf class. 2018-11-12 15:42:03,221 snapperdb.snpdb.vcf_to_db INFO Making SNPdb variables and output files 2018-11-12 15:42:03,438 snapperdb.snpdb.vcf_to_db INFO Uploading to SNPdb

Edit 2 : I did read this ISSUE as it was pretty similar : @timdallman commented :

it looks like the database did not form correctly when you made it manually. if you can delete it and try make_snpdb function it should work

I thus tried to follow what's listed within "Deleting or purging your database" in the README : dropdb -U user_snapperdb db_snapperdb;

@jb2cool did not have postgresql-contrib package installed (I have)

I launched the command again after this, here it the stderr :

/home/scaonp01/.local/lib/python2.7/site-packages/psycopg2/init.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: http://initd.org/psycopg/docs/install.html#binary-install-from-pypi. """) Namespace(command='make_snpdb', config_file='custom_salmo.txt', fastqs=[], log_dir='/home/scaonp01/Software/SnapperDB') Cant connect to SnapperDB db_snapperdb The SNPdb db_snapperdb does not exist - running sql to create database FASTQs found for AM933172 [2018-11-12 16:06:38,596] INFO: Version: N/A [2018-11-12 16:06:38,596] INFO: Initialising data matrix. [2018-11-12 16:06:38,601] INFO: Mapping data file with bwa. [2018-11-12 16:08:06,988] INFO: Creating digitised variants with gatk. [2018-11-12 16:09:48,255] INFO: Annotating [2018-11-12 16:11:42,808] INFO: Applying filters: ['mq_score:30', 'min_depth:10', 'ad_ratio:0.9'] Calulated depth is 128.05 - cuttoff is 30 Completed 2018-11-12 16:14:15.390364

It's all good now, right ? What mislead me was the "snpdb_name" line in the config file. I thought it was required to create a postgresql DB before lauching anything. Seems that I was wrong about this.

timdallman commented 5 years ago

Thanks for pointing this out will update the README