nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
302 stars 82 forks source link

Pasa issue in update step #141

Closed AnotherSimon closed 6 years ago

AnotherSimon commented 6 years ago

According to the manual, after running gene prediction (with RNAseq support), one should run update "to add UTR data to the predictions and fix gene models". When running:

funannotate update -i MyOrganism

The process seems to start off alright but soon fails:

[01:00:30 PM]: OS: linux2, 24 cores, ~ 66 GB RAM. Python: 2.7.9 [01:00:33 PM]: Running funannotate v1.1.1 [01:00:37 PM]: No NCBI SBT file given, will use default, for NCBI submissions pass one here '--sbt' [01:00:48 PM]: Reannotating My bug, NCBI accession: None [01:00:48 PM]: Previous annotation consists of: 8,049 protein coding gene models and 130 non-coding gene models [01:00:59 PM]: Existing PASA database contains 22,339 gene models, validated FASTA headers match [01:00:59 PM]: Running PASA annotation comparison step 1 [01:01:24 PM]: PASA failed, check log, exiting

Relevant logging info was found in funannotate-update.log:

DBD::mysql::db do failed: Table 'My_bug_strainNum.annotation_updates' doesn't exist at /home/simon/software/PASApipeline-pasa-v2.2.0/PerlLib//Mysql_connect.pm line 162. Thread 1 terminated abnormally: failed query: <insert annotation_updates (gene_id, model_id, alt_splice_flag, before_gene_obj, after_gene_obj, compare_id, is_valid, have_before, have_after, is_novel_flag) values (?,?,?,?,?,?,?,?,?,?)> values: 0 4 0 0 1 0 Errors: Table 'My_bug_strainNum.annotation_updates' doesn't exist at /home/simon/software/PASApipeline-pasa-v2.2.0/PerlLib//Mysql_connect.pm line 173 thread 1.

Manually connecting to the mysql instance confirms that this table does not exist for db "My_bug_strainNum". Prediction step did not raise any errors or highly visible warnings.

PS: This may or may not be relevant but the first time running predict it failed because I needed to add the hooks to @INC as follows:

PERL5LIB=/home/simon/software/PASApipeline-pasa-v2.2.0/SAMPLE_HOOKS:$PERL5LIB

My_bug_strainNum was manually dropped and prediction rerun.

nextgenusfs commented 6 years ago

Well certainly a pasa issue.... what it is trying to do here is not re-run PASA alignment step if it doesn't have to, so it looks into the DB to see if it has been run already, since it seems to think it has been run before, then it skips that step and tries to run PASA comparison (or update gene models).

If you ran PASA previously using a different --species name this might be the issue. It is setup so that if you run funannotate train followed by funanntoate predict and then finally add UTRs with funannotate update that it will re-use the PASA config file located in the training subfolder. To bypass this, say perhaps PASA failed and you ran it with another config file successfully, you can pass that config file to funannotate update --pasa_config.

So likely what is happening is that there is a pasa config file in the training directory that is causing the error. So you can 1) delete it and it will re-run alignment step, or 2) pass the script the correct pasa_config file used to run PASA the first time.

AnotherSimon commented 6 years ago

deleting the config file (./My_bug/training_misc/pasa/alignAssembly.txt) did indeed lead to remapping but the same error pops up ("Table 'My_bug.annotation_updates' doesn't exist"). I found an earlier error in funannotate-update.log:

Processing CMD: 33/33 12:31:00 CMD: /home/simon/software/PASApipeline-pasa-v2.2.0/scripts/describe_alignment_assemblies_cgi_convert.dbi -M My_bug > My_bug.pasa_assemblies_described.txt

########################################################################## Finished. Please visit the Assembly and Annotation Comparison results at: /status_report.cgi?db=My_bug ##########################################################################

2018-02-21 12:31:32,219: ERROR 1071 (42000) at line 172 in file: '/home/simon/software/PASApipeline-pasa-v2.2.0/schema/cdna_alignment_mysqlschema': Specified key was too long; max key length is 1000 bytes Committing... CMD: /data1/home/simon/software/PASApipeline-pasa-v2.2.0/scripts/process_GMAP_alignments_gff3_chimeras_ok.pl --genome My_bug/update_misc/genome.fa --transcripts My_bug/training/trinity.fasta --CPU 12 -N 1 -I 3000 > gmap.spliced_alignments.gff3 ...

nextgenusfs commented 6 years ago

Oh okay. I've seen this before. https://github.com/PASApipeline/PASApipeline/issues/45. Change the key length in the mysql schema.

AnotherSimon commented 6 years ago

I applied the fix suggested in the referenced issue and reran the pipeline from scratch. The error manifests differently but it's still there:

## Processing CMD:

16:38:57 CMD: /home/simon/software/PASApipeline-pasa-v2.2.0/scripts/cDNA_annotation_comparer.dbi -G /data1/home/simon/IMB/TDlab/Annotation/funannot_test/My_bug/update_misc/genome.fa --CPU 12 -M My_bug > pasa_run.log.dir/My_bug.annotation_compare.38322.out

2018-02-26 16:39:04,400: Warning, no genes retrieved for contig_id: contig.25 ... Warning, no genes retrieved for contig_id: contig.138 DBD::mysql::st execute failed: Table 'My_bug.annotation_updates' doesn't exist at /home/simon/software/PASApipeline-pasa-v2.2.0/PerlLib//Mysql_connect.pm line 124. DBD::mysql::st execute failed: Table 'My_bug.annotation_updates' doesn't exist at /home/simon/software/PASApipeline-pasa-v2.2.0/PerlLib//Mysql_connect.pm line 124.

==== Failed query: values: FUN_000003-T1 1 Errors: at /home/simon/software/PASApipeline-pasa-v2.2.0/PerlLib//Mysql_connect.pm line 148 thread 2. Mysql_connect::do_sql_2D('Mysql_connect=HASH(0x217ad00)', 'select after_gene_obj from annotation_updates where model_id ...', 'FUN_000003-T1', 1) called at /home/simon/software/PASApipeline-pasa-v2.2.0/PerlLib//Mysql_connect.pm line 190 thread 2 Mysql_connect::very_first_result_sql('Mysql_connect=HASH(0x217ad00)', 'select after_gene_obj from annotation_updates where model_id ...', 'FUN_000003-T1', 1) called at /home/simon/software/PASApipeline-pasa-v2.2.0/PerlLib//Ath1_cdnas.pm line 385 thread 2 Ath1_cdnas::get_updated_gene_obj('Mysql_connect=HASH(0x217ad00)', 'FUN_000003-T1', 1) called at /home/simon/software/PASApipeline-pasa-v2.2.0/scripts/cDNA_annotation_comparer.dbi line 1299 thread 2 main::get_latest_gene_models('Mysql_connect=HASH(0x217ad00)', 'FUN_000003') called at /home/simon/software/PASApipeline-pasa-v2.2.0/scripts/cDNA_annotation_comparer.dbi line 1799 thread 2 main::model_overlaps_exon_segment('Mysql_connect=HASH(0x217ad00)', 'HASH(0x217af28)', 'ARRAY(0x1fa6550)') called at /home/simon/software/PASApipeline-pasa-v2.2.0/scripts/cDNA_annotation_comparer.dbi line 1034 thread 2 main::get_potential_overlapping_annotations('Mysql_connect=HASH(0x217ad00)', 'ARRAY(0x1fa6550)', 'ARRAY(0x1fc2148)', 954, 1419, '-', 'nonfli') called at /home/simon/software/PASApipeline-pasa-v2.2.0/scripts/cDNA_annotation_comparer.dbi line 887 thread 2 main::analyze_non_fl_assemblies('Mysql_connect=HASH(0x217ad00)', 'ARRAY(0x7f07c4010e90)', 'ARRAY(0x1fc2148)', 'SCALAR(0x1fc21a8)') called at /home/simon/software/PASApipeline-pasa-v2.2.0/scripts/cDNA_annotation_comparer.dbi line 452 thread 2 main::process_contig(contig.25') called at /home/simon/software/PASApipeline-pasa-v2.2.0/scripts/cDNA_annotation_comparer.dbi line 309 thread 2 eval {...} called at /home/simon/software/PASApipeline-pasa-v2.2.0/scripts/cDNA_annotation_comparer.dbi line 309 thread 2

AnotherSimon commented 6 years ago

Hmm, not sure where I went wrong but applying both the var char length fix and deleting the old PASA alignment config file seems to have done the trick.

Do you know if an update to PASA will be forthcoming or should this be included somewhere in the funannotate documentation?

AnotherSimon commented 6 years ago

I think this issue can be considered solved.

peterthorpe5 commented 5 years ago

@AnotherSimon how did you fix the character length in Mysql. Googling this it implies you have to recompile mysql? https://stackoverflow.com/questions/5898518/how-do-i-increase-key-length-in-mysql-5-1

cheers, Pete

AnotherSimon commented 5 years ago

It's been a while but if I remember correctly, it went something along these lines: https://stackoverflow.com/questions/1279568/how-can-i-modify-the-size-of-column-in-a-mysql-table Just need to apply it to the correct table and column (this is where the memory gets fuzzy, see also Jon's comment pointing to PASA issue 45).

peterthorpe5 commented 5 years ago

Thank you very much for your reply. I think my issue is that I am using the Docker image which doesn’t recognise mysql as a command. – I will sort it.

Cheers, Pete

From: AnotherSimon notifications@github.com Sent: 20 March 2019 08:46 To: nextgenusfs/funannotate funannotate@noreply.github.com Cc: Peter Thorpe pjt6@st-andrews.ac.uk; Comment comment@noreply.github.com Subject: {Disarmed} Re: [nextgenusfs/funannotate] Pasa issue in update step (#141)

It's been a while but if I remember correctly, it went something along these lines: https://stackoverflow.com/questions/1279568/how-can-i-modify-the-size-of-column-in-a-mysql-table Just need to apply it to the correct table and column (this is where the memory gets fuzzy, see also Jon's comment pointing to PASA issue 45).

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/nextgenusfs/funannotate/issues/141#issuecomment-474738478, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqBMUeF34e30eKj-lKbf-pFCna1ybks5vYfU1gaJpZM4SLb0f. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/nextgenusfs/funannotate","title":"nextgenusfs/funannotate","subtitle":"GitHub repository","main_image_url":"https://github.githubassets.com/images/email/message_cards/header.png","avatar_image_url":"https://github.githubassets.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/nextgenusfs/funannotate"}},"updates":{"snippets":[{"icon":"PERSON","message":"@AnotherSimon in #141: It's been a while but if I remember correctly, it went something along these lines:\r\nhttps://stackoverflow.com/questions/1279568/how-can-i-modify-the-size-of-column-in-a-mysql-table\r\nJust need to apply it to the correct table and column (this is where the memory gets fuzzy, see also Jon's comment pointing to PASA issue 45)."}],"action":{"name":"View Issue","url":"https://github.com/nextgenusfs/funannotate/issues/141#issuecomment-474738478"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/nextgenusfs/funannotate/issues/141#issuecomment-474738478", "url": "https://github.com/nextgenusfs/funannotate/issues/141#issuecomment-474738478", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]