sebc31 / P-GRe

P-GRe is a software allowing the prediction of the positions of pseudogenes and their structures on a genome scale.
Other
2 stars 0 forks source link

error in PGRe_pseudogene_construct_step #1

Open BlanBlannito opened 6 months ago

BlanBlannito commented 6 months ago

Hello, I'm trying to use P-GRe to detect pseudognenes on Asellidae, but I'm encountering an error. I have no idea how to solve this problem. I did my annotation with Braker3, let me know if you need more detail, I don't know how precise I need to be.

this is the command I used :

./P-GRe_pipeline.sh -f /home/fblanchard/stage_m2/Data/Genome/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.fasta -g /home/fblanchard/stage_m2/Result/Annotation/PKK/braker.gff3 -o /home/fblanchard/stage_m2/Result/P-GRe/PKK/ -t 64 -v

what the programme tells me:



| \ / __| \ | |) |_| | | |_) | | _/__| | | | // _ \ | | | || | | \ \ / |_| ___|| \___|


P-GRe v1.0 starting. Command: /home/fblanchard/stage_m2/Tools/P-GRe-main/P-GRe_pipeline.sh -f /home/fblanchard/stage_m2/Data/Genome/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.fasta -g /home/fblanchard/stage_m2/Result/Annotation/PKK/braker.gff3 -o /home/fblanchard/stage_m2/Result/P-GRe/PKK -d /home/fblanchard/stage_m2/Tools/P-GRe-main -t 64 -v

Checking P-GRe installation... BASNI: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/script/BASNI.py CConIE: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/script/CConIE.py PGRe: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/script/PGRe.py TAGLIA: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/script/TAGLIA.py VITo: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/script/VITo.py PolyGet: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/script/PolyGet.py

Checking dependencies... tblastn: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/bin/tblastn blastp: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/bin/blastp makeblastdb: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/bin/makeblastdb bedtools: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/bin/bedtools gffread: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/bin/gffread stretcher: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/bin/stretcher

Checking input files... FASTA: OK /home/fblanchard/stage_m2/Data/Genome/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.fasta GFF: OK /home/fblanchard/stage_m2/Result/Annotation/PKK/braker.gff3

Creating working directories and files... Warning: the requested genome is already in the data/ folder. P-GRe will use the already existing genome. Warning: the specified GFF file is already in the data/ folder. P-GRe will use the already existing GFF file. gene-only GFF: OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/braker.gene_only.tempfile.gff

INFO: 15381 genes extracted from the GFF file << fasta masking: OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna makeblastdb: OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna.ndb OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna.nhr OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna.nin OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna.njs OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna.not OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna.nsq OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna.ntf OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna.nto INFO: 16354 chromosomes/scaffolds added << gffread: OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.prot.tempfile.faa INFO: 17138 proteins generated <<


P-GRe running.

tblastn: OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/proteome_vs_masked_genome_tblastn.blast filtering hits:

Parsing: OK << Filtering: OK << Overlaping: OK << Writing TSV: OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/tblastn_results.tempfile.tsv << FASTA writing: OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/first_pg_set.tempfile.fasta << pg generation: BLAST parsing: OK << PG generation: 0.00060% << pg generation: Error, check: /home/fblanchard/stage_m2/Result/P-GRe/PKK/log/PGRe_pseudogene_construct_step.err

and the error file :

/opt/src/miniconda3/envs/braker3/lib/python3.9/site-packages/Bio/Application/init.py:40: BiopythonDeprecationWarning: The Bio.Application modules and modules relying on it have been deprecated.

Due to the on going maintenance burden of keeping command line application wrappers up to date, we have decided to deprecate and eventually remove these modules.

We instead now recommend building your command line and invoking it directly with the subprocess module. warnings.warn( Traceback (most recent call last): File "/home/fblanchard/stage_m2/Tools/P-GRe-main/script/PGRe.py", line 280, in pseudo_prot, pseudogeneDic[pseudogene]['structure_cons'] = TAGLIA.lindleyAlign(alignement,pep_len,error_expected_gap,log) ValueError: too many values to unpack (expected 2)

Thank you for your help, I remain available.

I also noticed an error during installation: the folder containing the scripts is called scripts with an s, but P-GRe needs it to be called script without an s.

sebc31 commented 6 months ago

Bonjour Florian,

Normalement, P-GRe marche particulièrement bien avec les fichiers de BRAKER. De ce que je lis sur le fichier d'erreur, un des packages BioPython que P-GRe appelle n'existe plus, visiblement un qui sert à appeler une application ( Bio.Application ). Je suis actullement entrain de retravailler sur une nouvelle version 'au propre' de P-GRe qui n'utilisera plus ce package, et ça ne devrait plus être un problème. En attendant, tu peux utiliser une vieille version de BioPython avec laquelle ça devrait marcher (j'utilise BioPython 1.81). Si tu n'y arrives pas, tu peux m'envoyer tes fichiers si tu veux (je vois que le BLAST a fini de tourner, donc ça devrait aller vite si tu m'envois les résultats du BLAST), si il n'y a qu'un génome à annoter je peux m'en occuper.

A+

Sébastien

De: "Florian Blanchard" @.> À: "sebc31/P-GRe" @.> Cc: "Subscribed" @.***> Envoyé: Mercredi 20 Mars 2024 16:43:34 Objet: [sebc31/P-GRe] error in PGRe_pseudogene_construct_step (Issue #1)

Hello, I'm trying to use P-GRe to detect pseudognenes on Asellidae, but I'm encountering an error. I have no idea how to solve this problem. I did my annotation with Braker3, let me know if you need more detail, I don't know how precise I need to be.

this is the command I used :

./P-GRe_pipeline.sh -f /home/fblanchard/stage_m2/Data/Genome/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.fasta -g /home/fblanchard/stage_m2/Result/Annotation/PKK/braker.gff3 -o /home/fblanchard/stage_m2/Result/P-GRe/PKK/ -t 64 -v

what the programme tells me:

| \ / | | |) | | | | |) | | / | | | | // | | | |_| | | \ \ / | | | | |

P-GRe v1.0 starting. Command: /home/fblanchard/stage_m2/Tools/P-GRe-main/P-GRe_pipeline.sh -f /home/fblanchard/stage_m2/Data/Genome/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.fasta -g /home/fblanchard/stage_m2/Result/Annotation/PKK/braker.gff3 -o /home/fblanchard/stage_m2/Result/P-GRe/PKK -d /home/fblanchard/stage_m2/Tools/P-GRe-main -t 64 -v

Checking P-GRe installation... BASNI: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/script/BASNI.py CConIE: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/script/CConIE.py PGRe: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/script/PGRe.py TAGLIA: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/script/TAGLIA.py VITo: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/script/VITo.py PolyGet: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/script/PolyGet.py

Checking dependencies... tblastn: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/bin/tblastn blastp: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/bin/blastp makeblastdb: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/bin/makeblastdb bedtools: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/bin/bedtools gffread: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/bin/gffread stretcher: OK /home/fblanchard/stage_m2/Tools/P-GRe-main/bin/stretcher

Checking input files... FASTA: OK /home/fblanchard/stage_m2/Data/Genome/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.fasta GFF: OK /home/fblanchard/stage_m2/Result/Annotation/PKK/braker.gff3

Creating working directories and files... Warning: the requested genome is already in the data/ folder. P-GRe will use the already existing genome. Warning: the specified GFF file is already in the data/ folder. P-GRe will use the already existing GFF file. gene-only GFF: OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/braker.gene_only.tempfile.gff

INFO: 15381 genes extracted from the GFF file << fasta masking: OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna makeblastdb: OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna.ndb OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna.nhr OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna.nin OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna.njs OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna.not OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna.nsq OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna.ntf OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.hard_masked.tempfile.fna.nto INFO: 16354 chromosomes/scaffolds added << gffread: OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/Proasellus_karamani_PKK_flye_2023_12_29_asm_flyeHQ.prot.tempfile.faa INFO: 17138 proteins generated <<

P-GRe running.

tblastn: OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/proteome_vs_masked_genome_tblastn.blast filtering hits:

Parsing: OK << Filtering: OK << Overlaping: OK << Writing TSV: OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/tblastn_results.tempfile.tsv << FASTA writing: OK /home/fblanchard/stage_m2/Result/P-GRe/PKK/tmp/first_pg_set.tempfile.fasta << pg generation: BLAST parsing: OK << PG generation: 0.00060% << pg generation: Error, check: /home/fblanchard/stage_m2/Result/P-GRe/PKK/log/PGRe_pseudogene_construct_step.err

and the error file :

/opt/src/miniconda3/envs/braker3/lib/python3.9/site-packages/Bio/Application/ init .py:40: BiopythonDeprecationWarning: The Bio.Application modules and modules relying on it have been deprecated.

Due to the on going maintenance burden of keeping command line application wrappers up to date, we have decided to deprecate and eventually remove these modules.

We instead now recommend building your command line and invoking it directly with the subprocess module. warnings.warn( Traceback (most recent call last): File "/home/fblanchard/stage_m2/Tools/P-GRe-main/script/PGRe.py", line 280, in pseudo_prot, pseudogeneDic[pseudogene]['structure_cons'] = TAGLIA.lindleyAlign(alignement,pep_len,error_expected_gap,log) ValueError: too many values to unpack (expected 2)

Thank you for your help, I remain available.

I also noticed an error during installation: the folder containing the scripts is called scripts with an s, but P-GRe needs it to be called script without an s.

— Reply to this email directly, [ https://github.com/sebc31/P-GRe/issues/1 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/A6QCFGFNOPCLA34PRAV35JTYZGVCNAVCNFSM6AAAAABE7XQCUCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4TOOBXGA3TSNQ | unsubscribe ] . You are receiving this because you are subscribed to this thread. Message ID: <sebc31/P-GRe/issues/1 @ github . com>

BlanBlannito commented 6 months ago

Bonjour, J'ai essayé en forçant l'utilisation de BioPython 1.81 mais ça n'a pas réglé le problème. Merci de proposer de faire tourner mes données de votre côté, mais je risque d'avoir beaucoup d'espèce à faire tourner.

L'erreur arrive au même endroit, mais j'ai cette fois ce message :

Traceback (most recent call last): File "/home/fblanchard/stage_m2/Tools/P-GRe-main/script/PGRe.py", line 280, in pseudo_prot, pseudogeneDic[pseudogene]['structure_cons'] = TAGLIA.lindleyAlign(alignement,pep_len,error_expected_gap,log) ValueError: too many values to unpack (expected 2)

Si le problème n'est pas évident, je peux attendre la prochaine version, je ne suis pas pressé.

Merci de votre réponse, Bonne journée, Florian

sebc31 commented 5 months ago

Bonjour Florian,

Je travaille sur la prochaine version, ça ne devrait pas tarder. Je réglerai ce problème en même temps.

Bonne journée.

De: "Florian Blanchard" @.> À: "sebc31/P-GRe" @.> Cc: "Sébastien Cabanac" @.>, "Comment" @.> Envoyé: Lundi 25 Mars 2024 15:53:24 Objet: Re: [sebc31/P-GRe] error in PGRe_pseudogene_construct_step (Issue #1)

Bonjour, J'ai essayé en forçant l'utilisation de BioPython 1.81 mais ça n'a pas réglé le problème. Merci de proposer de faire tourner mes données de votre côté, mais je risque d'avoir beaucoup d'espèce à faire tourner.

L'erreur arrive au même endroit, mais j'ai cette fois ce message :

Traceback (most recent call last): File "/home/fblanchard/stage_m2/Tools/P-GRe-main/script/PGRe.py", line 280, in pseudo_prot, pseudogeneDic[pseudogene]['structure_cons'] = TAGLIA.lindleyAlign(alignement,pep_len,error_expected_gap,log) ValueError: too many values to unpack (expected 2)

Si le problème n'est pas évident, je peux attendre la prochaine version, je ne suis pas pressé.

Merci de votre réponse, Bonne journée, Florian

— Reply to this email directly, [ https://github.com/sebc31/P-GRe/issues/1#issuecomment-2018188803 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/A6QCFGATQN6FHTWW2VUKSL3Y2A26JAVCNFSM6AAAAABE7XQCUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJYGE4DQOBQGM | unsubscribe ] . You are receiving this because you commented. Message ID: <sebc31/P-GRe/issues/1/2018188803 @ github . com>