nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
314 stars 83 forks source link

Funannotate train failing when using mysql due to socket error #1056

Open monkfromouterspace opened 1 month ago

monkfromouterspace commented 1 month ago

Are you using the latest release? Yes

Describe the bug I have switched to mysql from sqlite because sqlite takes so long to run. Train keeps failing at the PASA step, due to an inability to connect through the socket. I am running my commands via ssh connection to a research server. PASA conf.txt file is setup as per template configuration file, the server admin has created all necessary users, and the details in my conf.txt file match these users. We have tried a variety of socket locations but virtually the same errors.

Any help with this would be greatly appreciated, thank you!

My conf.txt file looks like this:

#####################################
#### MANDATORY SETTINGS #############
#####################################

# This file is not used if SQLITEDB is set in the alignment assembly configuration file
## MySQL settings:

# server actively running MySQL
# MYSQLSERVER=server.com
MYSQLSERVER=localhost
# Pass socket connections through Perl DBI syntax e.g. MYSQLSERVER=mysql_socket=/tmp/mysql.sock

# read-write username and password
MYSQL_RW_USER=pasa_write
MYSQL_RW_PASSWORD=pasa_write_pwd

# read-only username and password
MYSQL_RO_USER=pasa_access
MYSQL_RO_PASSWORD=pasa_access

What command did you issue? funannotate train -i genome/aLisVul1.pri.asm.20230818_masked_scaffold-1-2-split.fa -o trained/ -l transcriptome/newt_testis_trimmed_1.fq.gz transcriptome/newt_F-liver_trimmed_1.fq.gz transcriptome/newt_ovary_trimmed_1.fq.gz transcriptome/newt_UK-liver_trimmed_1.fq.gz -r transcriptome/newt_testis_trimmed_2.fq.gz transcriptome/newt_F-liver_trimmed_2.fq.gz transcriptome/newt_ovary_trimmed_2.fq.gz transcriptome/newt_UK-liver_trimmed_2.fq.gz --trinity transcriptome/Lvulg_x_Lmont_tgm_reference_transcriptome.fa --species "Lissotriton vulgaris" --cpus 64 --no_trimmomatic --memory 500G --aligners minimap2 gmap --pasa_db mysql >trained/Lvulg_funannotate_run1-train.stdout 2>trained/Lvulg_funannotate_run1-train.err &

Logfiles Output from funannotate_train.log

[07/12/24 12:52:59]: /data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/bin/funannotate train -i genome/aLisVul1.pri.asm.20230818_masked_scaffold-1-2-split.fa -o trained/ -l transcriptome/newt_testis_trimmed_1.fq.gz transcriptome/newt_F-liver_trimmed_1.fq.gz transcriptome/newt_ovary_trimmed_1.fq.gz transcriptome/newt_UK-liver_trimmed_1.fq.gz -r transcriptome/newt_testis_trimmed_2.fq.gz transcriptome/newt_F-liver_trimmed_2.fq.gz transcriptome/newt_ovary_trimmed_2.fq.gz transcriptome/newt_UK-liver_trimmed_2.fq.gz --trinity transcriptome/Lvulg_x_Lmont_tgm_reference_transcriptome.fa --species Lissotriton vulgaris --cpus 64 --no_trimmomatic --memory 500G --aligners minimap2 gmap --pasa_db mysql

[07/12/24 12:52:59]: OS: Ubuntu 22.04, 240 cores, ~ 1915 GB RAM. Python: 3.8.15
[07/12/24 12:52:59]: Running 1.8.17
[07/12/24 12:53:08]: fasta version=36.3.8g path=/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/bin/fasta
[07/12/24 12:53:08]: minimap2 version=2.28-r1209 path=/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/bin/minimap2
[07/12/24 12:53:08]: hisat2 version=2.2.1 path=/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/bin/hisat2
[07/12/24 12:53:08]: hisat2-build version=NA path=/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/bin/hisat2-build
[07/12/24 12:53:08]: Trinity version=2.8.5 path=/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/bin/Trinity
[07/12/24 12:53:08]: java version=17.0.3-internal path=/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/bin/java
[07/12/24 12:53:08]: kallisto version=0.46.1 path=/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/bin/kallisto
[07/12/24 12:53:08]: /data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/pasa-2.5.3/Launch_PASA_pipeline.pl version=NA path=/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/pasa-2.5.3/Launch_PASA_pipeline.pl
[07/12/24 12:53:08]: /data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/pasa-2.5.3/bin/seqclean version=NA path=/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/pasa-2.5.3/bin/seqclean
[07/12/24 12:53:08]: minimap2 version=2.28-r1209 path=/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/bin/minimap2
[07/12/24 12:53:08]: gmap version=2024-03-15 path=/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/bin/gmap
[07/12/24 13:05:07]: Input reads: ('trained/training/left.fq.gz', 'trained/training/right.fq.gz', None)
[07/12/24 13:05:07]: Trimmomatic will be skipped
[07/12/24 13:05:07]: Quality trimmed reads: ('trained/training/left.fq.gz', 'trained/training/right.fq.gz', None)
[07/12/24 13:05:07]: FASTQ headers seem compatible with Trinity
[07/12/24 13:05:07]: Read normalization will be skipped
[07/12/24 13:05:07]: Normalized reads: ('trained/training/left.fq.gz', 'trained/training/right.fq.gz', None)
[07/12/24 13:05:07]: Long reads: (None, None, None)
[07/12/24 13:05:07]: Long reads FASTA format: (None, None, None)
[07/12/24 13:05:07]: Long SeqCleaned reads: (None, None, None)
[07/12/24 13:05:17]: 237,189 existing Trinity results found: trained/training/trinity.fasta
[07/12/24 13:05:18]: Removing poly-A sequences from trinity transcripts using seqclean
[07/12/24 13:05:18]: Existing SeqClean output found: trained/training/trained/training/trinity.fasta.clean
[07/12/24 13:05:18]: Existing BAM alignments found: trained/training/trinity.alignments.bam, trained/training/transcript.alignments.bam
[07/12/24 13:05:29]: Running PASA alignment step using 237,187 transcripts
[07/12/24 13:05:29]: /data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/pasa-2.5.3/Launch_PASA_pipeline.pl -c /data/tigrr/home/userx/analyses/genome_annotation/de_novo/smooth_newt/inputs/trained/training/pasa/alignAssembly.txt -r -C -R -g /data/tigrr/home/userx/analyses/genome_annotation/de_novo/smooth_newt/inputs/trained/training/genome.fasta --IMPORT_CUSTOM_ALIGNMENTS /data/tigrr/home/userx/analyses/genome_annotation/de_novo/smooth_newt/inputs/trained/training/trinity.alignments.gff3 -T -t /data/tigrr/home/userx/analyses/genome_annotation/de_novo/smooth_newt/inputs/trained/training/trinity.fasta.clean -u /data/tigrr/home/userx/analyses/genome_annotation/de_novo/smooth_newt/inputs/trained/training/trinity.fasta --stringent_alignment_overlap 30.0 --TRANSDECODER --ALT_SPLICE --MAX_INTRON_LENGTH 3000 --CPU 64 --ALIGNERS gmap
[07/12/24 13:05:30]: CMD ERROR: /data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/pasa-2.5.3/Launch_PASA_pipeline.pl -c /data/tigrr/home/userx/analyses/genome_annotation/de_novo/smooth_newt/inputs/trained/training/pasa/alignAssembly.txt -r -C -R -g /data/tigrr/home/userx/analyses/genome_annotation/de_novo/smooth_newt/inputs/trained/training/genome.fasta --IMPORT_CUSTOM_ALIGNMENTS /data/tigrr/home/userx/analyses/genome_annotation/de_novo/smooth_newt/inputs/trained/training/trinity.alignments.gff3 -T -t /data/tigrr/home/userx/analyses/genome_annotation/de_novo/smooth_newt/inputs/trained/training/trinity.fasta.clean -u /data/tigrr/home/userx/analyses/genome_annotation/de_novo/smooth_newt/inputs/trained/training/trinity.fasta --stringent_alignment_overlap 30.0 --TRANSDECODER --ALT_SPLICE --MAX_INTRON_LENGTH 3000 --CPU 64 --ALIGNERS gmap

Output of pasa-assembly.log

-connecting to MySQL db: Lissotriton_vulgaris_pasa
Use of uninitialized value in pattern match (m//) at /data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/pasa-2.5.3/Launch_PASA_pipeline.pl line 281.
-*** Running PASA pipeine:
* [Fri Jul 12 13:05:29 2024] Running CMD: /data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/pasa-2.5.3/scripts/create_mysql_cdnaassembly_db.dbi -c /data/tigrr/home/userx/analyses/genome_annotation/de_novo/smooth_newt/inputs/trained/training/pasa/alignAssembly.txt -S '/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/pasa-2.5.3/schema/cdna_alignment_mysqlschema' -r
DBI connect('database=;host=localhost','pasa_write',...) failed: Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2) at /data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/pasa-2.5.3/PerlLib/DB_connect.pm line 72.
Cannot connect to : Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2) at /data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/pasa-2.5.3/scripts/create_mysql_cdnaassembly_db.dbi line 57.
Error, cmd: /data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/pasa-2.5.3/scripts/create_mysql_cdnaassembly_db.dbi -c /data/tigrr/home/userx/analyses/genome_annotation/de_novo/smooth_newt/inputs/trained/training/pasa/alignAssembly.txt -S '/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/pasa-2.5.3/schema/cdna_alignment_mysqlschema' -r died with ret 65280 No such file or directory at /data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/pasa-2.5.3/PerlLib/Pipeliner.pm line 187.
        Pipeliner::run(Pipeliner=HASH(0x560f2649a720)) called at /data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/pasa-2.5.3/Launch_PASA_pipeline.pl line 1061

OS/Install Information

-------------------------------------------------------
Checking dependencies for 1.8.17
-------------------------------------------------------
You are running Python v 3.8.15. Now checking python packages...
biopython: 1.83
goatools: 1.3.11
matplotlib: 3.4.3
natsort: 8.4.0
numpy: 1.24.4
pandas: 1.5.3
psutil: 5.9.8
requests: 2.31.0
scikit-learn: 1.3.2
scipy: 1.10.1
seaborn: 0.13.2
All 11 python packages installed

You are running Perl v b'5.032001'. Now checking perl modules...
Carp: 1.50
Clone: 0.46
DBD::SQLite: 1.72
DBD::mysql: 4.046
DBI: 1.643
DB_File: 1.858
Data::Dumper: 2.183
File::Basename: 2.85
File::Which: 1.24
Getopt::Long: 2.54
Hash::Merge: 0.302
JSON: 4.10
LWP::UserAgent: 6.67
Logger::Simple: 2.0
POSIX: 1.94
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.14
Tie::File: 1.06
URI::Escape: 5.17
YAML: 1.30
local::lib: 2.000029
threads: 2.25
threads::shared: 1.61
All 27 Perl modules installed

Checking Environmental Variables...
$FUNANNOTATE_DB=/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/funannotate_db/
$PASAHOME=/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/pasa-2.5.3
$TRINITY_HOME=/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/trinity-2.8.5
$EVM_HOME=/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/data/tigrr/home/userx/anaconda3/envs/funannotate_env2_v1.8.16/envs/funannotate_env3_v1.8.17/config/
        ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

-------------------------------------------------------
Checking external dependencies...
PASA: 2.5.3
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.5.0
bamtools: bamtools 2.5.1
bedtools: bedtools v2.31.1
blat: BLAT v37x1
diamond: 2.1.8
ete3: 3.1.3
exonerate: exonerate 2.4.0
fasta: 36.3.8g
glimmerhmm: 3.0.4
gmap: 2024-03-15
hisat2: 2.2.1
hmmscan: HMMER 3.4 (Aug 2023)
hmmsearch: HMMER 3.4 (Aug 2023)
java: 17.0.3-internal
kallisto: 0.46.1
mafft: v7.525 (2024/Mar/13)
makeblastdb: makeblastdb 2.14.1+
minimap2: 2.28-r1209
pigz: 2.8
proteinortho: 6.3.1
pslCDnaFilter: no way to determine
salmon: salmon 0.14.1
samtools: samtools 1.18
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.12 (Nov 2022)
tantan: tantan 49
tbl2asn: 25.8
tblastn: tblastn 2.14.1+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
        ERROR: emapper.py not installed
        ERROR: gmes_petap.pl not installed
        ERROR: signalp not installed
-------------------------------------------------------
monkfromouterspace commented 1 month ago

I'll leave this open for a bit longer in case there's some value for others, but I tried again with just sqlite and it wasn't too slow. It's much easier to run!

hyphaltip commented 2 weeks ago

Generally I use a host:port for connection.