idolawoye / BAGEP

A pipeline for Bacterial Whole genome sequence data analysis
MIT License
13 stars 2 forks source link

Question #8

Open sekhwal opened 2 years ago

sekhwal commented 2 years ago

I am trying to install BAGEP with the following command but it is running at "solving environment" for very long.

conda env create -f environment.yml

idolawoye commented 2 years ago

Hi, can you clone the repo again and try to install it again? I have just updated some dependencies

sekhwal commented 2 years ago

I tried, but it was not working so I installed all the dependencies manually one by one. However, it does not allow me to install snippy and centrifuge. I am trying these on Anaconda. Can you suggest, how to install the pipeline.

sekhwal commented 2 years ago

It shows the following error after running a while.

Touching output file fastq/SRR1210481.snippy. [Wed Jul 27 14:59:49 2022] Finished job 755. 2 of 1221 steps (0.16%) done

[Wed Jul 27 14:59:49 2022] Job 512: Taxonomic classification of processed reads using centrifuge

/usr/bin/bash: CENTRIFUGE_DEFAULT_DB: unbound variable [Wed Jul 27 14:59:49 2022] Error in rule centrifuge: jobid: 512 output: taxonomy/fastq/SRR1210481-report.txt, taxonomy/fastq/SRR1210481-result.txt shell: centrifuge -p 4 -x $CENTRIFUGE_DEFAULT_DB -1 fastp/fastq/SRR1210481_R1.fastq.gz.fastp -2 fastp/fastq/SRR1210481_R2.fastq.gz.fastp --report-file taxonomy/fastq/SRR1210481-report.txt -S taxonomy/fastq/SRR1210481-result.txt (exited with non-zero exit code)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/mmk6053/Data/Manoj_data/Entrococcus_project/BAGEP/.snakemake/log/2022-07-27T145131.709376.snakemake.log

sekhwal commented 2 years ago

I reinstalled snippy nut it is still showing following error.


/usr/bin/bash: CENTRIFUGE_DEFAULT_DB: unbound variable [Wed Jul 27 15:53:49 2022] Error in rule centrifuge: jobid: 512 output: taxonomy/fastq/SRR1210481-report.txt, taxonomy/fastq/SRR1210481-result.txt shell: centrifuge -p 4 -x $CENTRIFUGE_DEFAULT_DB -1 fastp/fastq/SRR1210481_R1.fastq.gz.fastp -2 fastp/fastq/SRR1210481_R2.fastq.gz.fastp --report-file taxonomy/fastq/SRR1210481-report.txt -S taxonomy/fastq/SRR1210481-result.txt (exited with non-zero exit code)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/mmk6053/Data/Manoj_data/Entrococcus_project/BAGEP/.snakemake/log/2022-07-27T155254.684652.snakemake.log

idolawoye commented 2 years ago

The error message is with Centrifuge, not snippy.

Before running the pipeline, you need to download the centrifuge database, then set it up as shown in the README.md file and also set up Krona taxonomy. Can you confirm that you have completed these steps?

sekhwal commented 2 years ago

I performed all these steps, the centrifuge database is installed Download and install Centrifuge database which is approximately 8 GB with the following steps

wget -c ftp://ftp.ccb.jhu.edu/pub/infphilo/centrifuge/data/p_compressed+h+v.tar.gz mkdir $HOME/centrifuge-db tar -C $HOME/centrifuge-db -zxvf p_compressed+h+v.tar.gz export CENTRIFUGE_DEFAULT_DB=$HOME/centrifuge-db/p_compressed+h+v

sekhwal commented 2 years ago

I also setup Krona with the following steps:

rm -rf ~/anaconda3/envs/bagep/opt/krona/taxonomy mkdir -p ~/krona/taxonomy ln -s ~/krona/taxonomy/ ~/miniconda3/envs/bagep/opt/krona/taxonomy ktUpdateTaxonomy.sh ~/krona/taxonomy

snakemake --config ref=enterococcus_genome.fasta

However, it is showing an error:

Error in rule snippy: jobid: 755 output: fastq/SRR1210481/, fastq/SRR1210481.snippy shell: snippy --force --cleanup --outdir fastq/SRR1210481/ --ref enterococcus_genome.fasta --R1 fastp/fastq/SRR1210481_R1.fastq.gz.fastp --R2 fastp/fastq/SRR1210481_R2.fastq.gz.fastp (exited with non-zero exit code)

Removing output files of failed job snippy since they might be corrupted: fastq/SRR1210481/ Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/mmk6053/Data/Manoj_data/Entrococcus_project/BAGEP/.snakemake/log/2022-07-28T122529.877538.snakemake.log

idolawoye commented 2 years ago

What version of snippy do you have installed and can you share the snippy log file?

Idowu Olawoye about.me/idowu

On Thu, 28 Jul 2022, 17:28 Manoj Kumar, @.***> wrote:

I also setup Krona with the following steps:

rm -rf ~/anaconda3/envs/bagep/opt/krona/taxonomy mkdir -p ~/krona/taxonomy ln -s ~/krona/taxonomy/ ~/miniconda3/envs/bagep/opt/krona/taxonomy ktUpdateTaxonomy.sh ~/krona/taxonomy

snakemake --config ref=enterococcus_genome.fasta

However, it is showing an error:

Error in rule snippy: jobid: 755 output: fastq/SRR1210481/, fastq/SRR1210481.snippy shell: snippy --force --cleanup --outdir fastq/SRR1210481/ --ref enterococcus_genome.fasta --R1 fastp/fastq/SRR1210481_R1.fastq.gz.fastp --R2 fastp/fastq/SRR1210481_R2.fastq.gz.fastp (exited with non-zero exit code)

Removing output files of failed job snippy since they might be corrupted: fastq/SRR1210481/ Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/mmk6053/Data/Manoj_data/Entrococcus_project/BAGEP/.snakemake/log/2022-07-28T122529.877538.snakemake.log

— Reply to this email directly, view it on GitHub https://github.com/idolawoye/BAGEP/issues/8#issuecomment-1198374401, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJNJOXJ33OE6N3BBWEQK5WLVWKYLTANCNFSM54MHSNWQ . You are receiving this because you were assigned.Message ID: @.***>

sekhwal commented 2 years ago

I am using snippy 4.4.3. Where I can find a snippy log file since the pipeline isn't started the snippy?

idolawoye commented 2 years ago

Did other steps run without issues? Also is your reference genome in the directory level where the snakefile is in?

sekhwal commented 2 years ago

Pipeline starts with generating a message "Filtering fastQ files by trimming low quality reads using fastp". It generates a folder "fastp" and two R1 and R2 files, after it stops. I have a working directory called BAGEP, I have put the reference genome and extracted files of bagep.

sekhwal commented 2 years ago

Here is the complete run.

(bagep) mmk53@A8-VT-MMK53-U1:/media/Data/Manoj_data/Entrococcus_project/BAGEP$ snakemake --config ref=enterococcus_genome.fasta Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 abricate 1 all 243 centrifuge 243 fastp 243 krona_plot 1 move_files 243 prep_centrifuge_results 243 snippy 1 snippy_core 1 tree 1 vcf_viewer 1221

[Thu Jul 28 14:33:11 2022] Job 998: Filtering fastQ files by trimming low quality reads using fastp

Read1 before filtering: total reads: 15802641 total bases: 1544896866 Q20 bases: 1535446430(99.3883%) Q30 bases: 1456450112(94.2749%)

Read2 before filtering: total reads: 15802641 total bases: 1530557284 Q20 bases: 1519107818(99.2519%) Q30 bases: 1434036932(93.6938%)

Read1 after filtering: total reads: 15802641 total bases: 1543261701 Q20 bases: 1533825402(99.3885%) Q30 bases: 1455009585(94.2815%)

Read2 after filtering: total reads: 15802641 total bases: 1528737202 Q20 bases: 1517311178(99.2526%) Q30 bases: 1432508948(93.7054%)

Filtering result: reads passed filter: 31605282 reads failed due to low quality: 0 reads failed due to too many N: 0 reads failed due to too short: 0 reads with adapter trimmed: 233652 bases trimmed due to adapters: 3455247

Duplication rate: 0.754108%

Insert size peak (evaluated by paired-end reads): 144

JSON report: fastp.json HTML report: fastp.html

fastp -i fastq/SRR1210481_R1.fastq.gz -I fastq/SRR1210481_R2.fastq.gz -o fastp/fastq/SRR1210481_R1.fastq.gz.fastp -O fastp/fastq/SRR1210481_R2.fastq.gz.fastp fastp v0.23.2, time used: 54 seconds [Thu Jul 28 14:34:05 2022] Finished job 998. 1 of 1221 steps (0.08%) done

[Thu Jul 28 14:34:05 2022] Job 512: Taxonomic classification of processed reads using centrifuge

/usr/bin/bash: CENTRIFUGE_DEFAULT_DB: unbound variable [Thu Jul 28 14:34:05 2022] Error in rule centrifuge: jobid: 512 output: taxonomy/fastq/SRR1210481-report.txt, taxonomy/fastq/SRR1210481-result.txt shell: centrifuge -p 4 -x $CENTRIFUGE_DEFAULT_DB -1 fastp/fastq/SRR1210481_R1.fastq.gz.fastp -2 fastp/fastq/SRR1210481_R2.fastq.gz.fastp --report-file taxonomy/fastq/SRR1210481-report.txt -S taxonomy/fastq/SRR1210481-result.txt (exited with non-zero exit code)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/Data/Manoj_data/Entrococcus_project/BAGEP/.snakemake/log/2022-07-28T143310.429117.snakemake.log (bagep) mmk63@A8-VT-MMK63-U1:/media/Data/Manoj_data/Entrococcus_project/BAGEP$

sekhwal commented 2 years ago

It seems, it is showing some error in filtering as low quality reads. However, these data I used separately with Snippy and it ran successfully.

idolawoye commented 2 years ago

It appears you are running the workflow with 1 core. You can split the job across multiple threads depending on how many you have available. Try: snakemake --cores 4 --config ref=enterococcus_genome.fasta This will use 4 threads and make it faster.

Also the log message shows that $CENTRIFUGE_DEFAULT_DB has not been bound to the centrifuge-db/p_compressed+h+v database you downloaded. That is why it failed at centrifuge step.

If a stage in the pipeline fails, it will truncate the entire process

idolawoye commented 2 years ago

Hi, Any progress with the analysis?

sekhwal commented 2 years ago

Hi, Sorry for slow response. I will come back to my analysis at BAGEP. I got stuck in some other tasks.

sekhwal commented 2 years ago

Hi, when I run the pipeline with --core 40, it occupies all my system's memory (~124G). Eventually, the system becomes stop.

snakemake --cores 40 --config ref=enterococcus_genome.fasta

idolawoye commented 2 years ago

Hi,

I am guessing that's because of the huge number of samples you're running and the intermediary files that were generated. You might want to free up some space or run it on an external drive with extra room. The pipeline would clean up majority of the files at the end of the analysis and you can also delete the files generated by fastp at the end of the run as well

Idowu Olawoye about.me/idowu

On Fri, 5 Aug 2022, 17:06 Manoj Kumar, @.***> wrote:

Hi, when I run the pipeline with --core 40, it occupies all my system's memory (~124G). Eventually, the system becomes stop.

snakemake --cores 40 --config ref=enterococcus_genome.fasta

— Reply to this email directly, view it on GitHub https://github.com/idolawoye/BAGEP/issues/8#issuecomment-1206617299, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJNJOXPI27KJ3MJ6HKPTCZTVXU3YDANCNFSM54MHSNWQ . You are receiving this because you were assigned.Message ID: @.***>

sekhwal commented 2 years ago

Hi, I got the error in the end of the pipeline running. It generates fastp, krona, taxonomy folder but these are empty. Only fastp has the data. While running, is shows results like:

Filtering result: reads passed filter: 16430064 reads failed due to low quality: 0 reads failed due to too many N: 0 reads failed due to too short: 0 reads with adapter trimmed: 33160 bases trimmed due to adapters: 311269


fastp -i fastq/ERR4230412_R1.fastq.gz -I fastq/ERR4230412_R2.fastq.gz -o fastp/fastq/ERR4230412_R1.fastq.gz.fastp -O fastp/fastq/ERR4230412_R2.fastq.gz.fastp fastp v0.23.2, time used: 392 seconds [Fri Aug 5 16:56:59 2022] Finished job 1091. 40 of 1221 steps (3%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/Entrococcus_project/BAGEP/.snakemake/log/2022-08-05T165026.563922.snakemake.log

idolawoye commented 2 years ago

What does the log file look like? The fastp completed successfully but a rule of thumb is to pinpoint why the pipeline failed

sekhwal commented 2 years ago

I could not find the log file. I suspect the pipeline was installed properly. Let me see the installing process again..

idolawoye commented 2 years ago

Hi,

This error is because you haven't set up your centrifuge database to that variable. Have you downloaded the database to your computer?

Idowu Olawoye about.me/idowu

On Wed, 27 Jul 2022, 20:01 Manoj Kumar, @.***> wrote:

It shows the following error after running a while.

Touching output file fastq/SRR1210481.snippy. [Wed Jul 27 14:59:49 2022] Finished job 755. 2 of 1221 steps (0.16%) done

[Wed Jul 27 14:59:49 2022] Job 512: Taxonomic classification of processed reads using centrifuge

/usr/bin/bash: CENTRIFUGE_DEFAULT_DB: unbound variable [Wed Jul 27 14:59:49 2022] Error in rule centrifuge: jobid: 512 output: taxonomy/fastq/SRR1210481-report.txt, taxonomy/fastq/SRR1210481-result.txt shell: centrifuge -p 4 -x $CENTRIFUGE_DEFAULT_DB -1 fastp/fastq/SRR1210481_R1.fastq.gz.fastp -2 fastp/fastq/SRR1210481_R2.fastq.gz.fastp --report-file taxonomy/fastq/SRR1210481-report.txt -S taxonomy/fastq/SRR1210481-result.txt (exited with non-zero exit code)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/mmk6053/Data/Manoj_data/Entrococcus_project/BAGEP/.snakemake/log/2022-07-27T145131.709376.snakemake.log

— Reply to this email directly, view it on GitHub https://github.com/idolawoye/BAGEP/issues/8#issuecomment-1197245248, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJNJOXM2ZMDTDMJ3LOI33EDVWGBQFANCNFSM54MHSNWQ . You are receiving this because you were assigned.Message ID: @.***>