sanger-bentley-group / GBS-Typer-sanger-nf

GBS Typer Pipeline in Nextflow
GNU General Public License v3.0
0 stars 1 forks source link

problems with running PBPtyper #75

Closed gaworj closed 4 weeks ago

gaworj commented 2 years ago

Hello,

I have successfully installed pipeline but encountered problems with analysis of my samples Fortunately I was able to perform analysis on sample data located at /GBS-Typer-sanger-nf/tree/main/tests/regression_test_data)/input_data/

Are there any specific recommendations regarding sample input such as file extension or multifasta format?

I have tried to change fasta extension into .fas or .fa. My S.pneumonie genomes were assembled using SPADes.

Here is the terminal output for one of my samples:

(nextflow) jang@jang-MS-7B18:~/data_SSD2/genome_analysis/GBS-Typer-sanger-nf/GBS-Typer-sanger-nf$ nextflow run main.nf --output 'NIL12' --run_sero_res false --run_pbptyper --contigs 'good_final/NIL12.fasta' N E X T F L O W ~ version 22.04.3 Launching main.nf [suspicious_fourier] DSL2 - revision: 90e74631b4 executor > local (1) [51/92c5c7] process > get_pbp_genes (1) [ 0%] 0 of 1 [- ] process > PBP1A:get_pbp_alleles - [- ] process > PBP1A:finalise_pbp_existing_allele_results - [- ] process > PBP2B:get_pbp_alleles - [- ] process > PBP2B:finalise_pbp_existing_allele_results - [- ] process > PBP2X:get_pbp_alleles - [- ] process > PBP2X:finalise_pbp_existing_allele_results - Error executing process > 'get_pbp_genes (1)'

Caused by: Process get_pbp_genes (1) terminated with an error exit status (1)

Command executed:

Build a blast reference database from the assmeblies

makeblastdb -in NIL12.fasta -dbtype nucl -out NIL12_contig_blast_db

Blast the blactam database against the blast reference database

blastn -db NIL12_contig_blast_db -query GBS_bLactam_Ref.fasta -outfmt 6 -word_size 7 -out NIL12_blast_blactam.out

Get BED file of PBP fragments

executor > local (1) [51/92c5c7] process > get_pbp_genes (1) [100%] 1 of 1, failed: 1 ✘ [- ] process > PBP1A:get_pbp_alleles - [- ] process > PBP1A:finalise_pbp_existing_allele_results - [- ] process > PBP2B:get_pbp_alleles - [- ] process > PBP2B:finalise_pbp_existing_allele_results - [- ] process > PBP2X:get_pbp_alleles - [- ] process > PBP2X:finalise_pbp_existing_allele_results - Error executing process > 'get_pbp_genes (1)'

Caused by: Process get_pbp_genes (1) terminated with an error exit status (1)

Command executed:

Build a blast reference database from the assmeblies

makeblastdb -in NIL12.fasta -dbtype nucl -out NIL12_contig_blast_db

Blast the blactam database against the blast reference database

blastn -db NIL12_contig_blast_db -query GBS_bLactam_Ref.fasta -outfmt 6 -word_size 7 -out NIL12_blast_blactam.out

Get BED file of PBP fragments

get_pbp_genes_from_contigs.py --blast_out_file NIL12_blast_blactam.out --query_fasta GBS_bLactam_Ref.fasta --frac_align_len_threshold 0.5 --frac_identity_threshold 0.5 --outputprefix NIL12

Clean directory

mkdir output mv NIL12_bed output mv NIL12.fasta output find . -maxdepth 1 -type f -delete unlink GBS_bLactamRef.fasta mv output/NIL12bed . mv output/NIL12.fasta . rm -d output

Command exit status: 1

Command output:

Building a new DB, current time: 08/23/2022 13:36:40 New DB name: NIL12_contig_blast_db New DB title: NIL12.fasta Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 105 sequences in 0.018683 seconds.

Command error: mv: cannot stat 'NIL12_*bed': No such file or directory

Work dir: /mnt/SSD2/genome_analysis/GBS-Typer-sanger-nf/GBS-Typer-sanger-nf/work/51/92c5c7826698a69fe6d7ecdf252785

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

It looks like the bed file is not created. Does it mean that my sample does not contain PBP and the pipeline crashes?

Any hints?

Bests, Jan

blue-moon22 commented 2 years ago

Hi Jan!

Hard to tell from looking at this. It could be an error with the script get_pbp_genes_from_contigs.py not generating the bed file in the first place, which Nextflow tends to ignore.

Do you still have the work directory? If you do, could you print the content of the file work/51/92c5c7*/.command.log?

Best, Vicky

gaworj commented 2 years ago

Hi,

Here is the output from another problematic file (exactly the same issue - no bed file created):

(nextflow) jang@jang-MS-7B18:~/data_SSD2/genome_analysis/Weronika/GBS-Typer-sanger-nf/GBS-Typer-sanger-nf$ cat work/d1/15055fa3b53d92d5aaf81422a894e4/.command.log

Building a new DB, current time: 08/24/2022 15:59:26 New DB name: /mnt/SSD2/genome_analysis/Weronika/GBS-Typer-sanger-nf/GBS-Typer-sanger-nf/work/d1/15055fa3b53d92d5aaf81422a894e4/NIL12_contig_blastdb New DB title: NIL12.fasta Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 105 sequences in 0.0262592 seconds. mv: cannot stat 'NIL12*bed': No such file or directory

Bests, Jan

blue-moon22 commented 2 years ago

It's likely because there are no PBP genes detected in your contigs, but the expected behaviour of the pipeline should be to produce no output without errors and then would ignore the next stage: get_pbp_alleles.

I included a clean up stage to remove some intermediate files in the last release, but didn't test this fully in the get_pbp_genes. It will be a quick fix

blue-moon22 commented 2 years ago

This should be fixed now. Please do a git fetch and then a git pull If you get another error, let me know. I will keep this issue open until you run it successfully

gaworj commented 2 years ago

Looks like the problem is partially solved. Please add some pipeline output report that no genes were found. I have to check my dataset and check for PBPs.

blue-moon22 commented 2 years ago

Sure. I will look into this when I'm back from leave. I guess with Strep pneumo that might be more typical than with Group B Strep. (Note that the PBP genes used are from a GBS reference database, but probably the same as what you're looking for anyway https://github.com/BenJamesMetcalf/GBS_Scripts_Reference/tree/master/GBS_Reference_DB)

gaworj commented 2 years ago

Hi,

I recently did another test with sample that was already published and we are sure that it contains pbp:

nextflow run main.nf --reads 'data/*_{trim_R1,trim_R2}.fastq.gz' --output 'ERR4991741_pbp' --run_pbptyper --contigs 'data/ERR4991741_assembly.fa' N E X T F L O W ~ version 21.04.1 Launching main.nf [magical_noyce] - revision: 90e74631b4 executor > local (5) [70/a2f270] process > serotyping (1) [ 0%] 0 of 1 executor > local (5) [- ] process > serotyping (1) - [57/3cf914] process > GBS_RES:split_target_RES_sequences [100%] 1 of 1 ✔ [b3/5ae14e] process > GBS_RES:srst2_for_res_typing (1) [100%] 1 of 1, failed: 1 ✘ [- ] process > GBS_RES:split_target_RES_seq_from_sam_file - [- ] process > GBS_RES:freebayes - [- ] process > OTHER_RES:srst2_for_res_typing (1) - [- ] process > res_typer - [- ] process > finalise_sero_res_results - [3d/ec134e] process > get_pbp_genes (1) [100%] 1 of 1 ✔ [- ] process > PBP1A:get_pbp_alleles - [- ] process > PBP1A:finalise_pbp_existing_allele_results - [- ] process > PBP2B:get_pbp_alleles - [- ] process > PBP2B:finalise_pbp_existing_allele_results - [- ] process > PBP2X:get_pbp_alleles - [- ] process > PBP2X:finalise_pbp_existing_allele_results - Error executing process > 'GBS_RES:srst2_for_res_typing (1)'

Caused by: Process GBS_RES:srst2_for_res_typing (1) terminated with an error exit status (1)

Command executed:

srst2 --samtools_args '-A' --input_pe ERR4991741_trim_R1.fastq.gz ERR4991741_trim_R2.fastq.gz --output ERR4991741 --log --save_scores --min_coverage 99.9 --max_divergence 5 --gene_db GBS_Res_Gene-DB_Final.fasta

touch ERR4991741fullgenesGBS_Res_Gene-DB_Final__results.txt

Clean directory

mkdir output mv ERR4991741.bam output mv ERR4991741fullgenesGBS_Res_Gene-DB_Final__results.txt output find . -maxdepth 1 -type f -delete unlink ERR4991741_trim_R1.fastq.gz unlink ERR4991741_trim_R2.fastq.gz unlink GBS_Res_Gene-DB_Final.fasta mv output/ERR4991741.bam . mv output/ERR4991741fullgenesGBS_Res_Gene-DB_Final__results.txt . rm -d output

Command exit status: 1

Command output: (empty)

Command error: mv: cannot stat 'ERR4991741*.bam': No such file or directory

Work dir: /mnt/SSD2/genome_analysis/Weronika/GBS-Typer-sanger-nf/work/b3/5ae14e6223c19868b8cc94e49a2703

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

Bests, Jan

blue-moon22 commented 2 years ago

Actually the get_pbp_genes part of the pipeline succeeded, and this time srst2_for_res_typing failed (this is unrelated to the PBP-specific workflow).

Could you share the output of /mnt/SSD2/genome_analysis/Weronika/GBS-Typer-sanger-nf/work/b3/5ae14e6223c19868b8cc94e49a2703/.command.out please?