nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
187 stars 117 forks source link

NFCORE_AMPLISEQ:AMPLISEQ:DADA2_QUALITY #368

Closed avinash-ngc closed 2 years ago

avinash-ngc commented 2 years ago

Hii There ...

I am continuously facing an issue in the ampliseq pipeline at 'NFCORE_AMPLISEQ:AMPLISEQ:DADA2_QUALITY (FW) stage. The error has been pasted below.

" Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:DADA2_QUALITY (FW)'

Caused by: Process NFCORE_AMPLISEQ:AMPLISEQ:DADA2_QUALITY (FW) terminated with an error exit status (1)

Command executed:

dada_quality.r "FW_qual_stats" 5e+06 echo 'plotQualityProfile 5e+06' > "plotQualityProfile.args.txt"

Command exit status: 1

Command output: [1] "FW_qual_stats" [1] 5000000 "

Thanks a lot

d4straub commented 2 years ago

Hi, could you please provide some more details, command you used to start the pipeline, nextflow version, pipeline version, and roughly how many samples you have? The latter is because I think I have seen a similar problem once with very long reads, cannot find the issue right now. The error ouput is not really helping at all, could you attach the .nextflow.log file (in that file the above info should be already provided)?

avinash-ngc commented 2 years ago

Thanks for the prompt reply. The read length is around 251bp and the sequencing chemistry is paired end. There are 60 samples and the google drive link for the log file has been provided below. The link - https://drive.google.com/file/d/1WCmR4IlP6H-4c_eK6JqE1lH7Mfqr3rLv/view?usp=sharing Thanks once again

d4straub commented 2 years ago

Thanks for the log file. Above you left out the most important part of the error message:

Command error:
  Error in data.frame(sequence = names(freqtbl$top), count = as.integer(freqtbl$top),  : 
    arguments imply differing number of rows: 0, 1
  Calls: plotQualityProfile ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
  Execution halted

That means that the input seems to be an empty table. So you lost all your reads before that step. The input to that step are the cutadapt outputs, that remove the primer sequences. That means that your primer sequences were not matching the sequencing reads. Several possibilities:

  1. the sequencing data was trimmed before from the primer sequence
  2. the sequencing data never contained primer sequences (depending on lib prep protocal, but unusual)
  3. the primer sequence is wrong

I assume the latter because a quick googling of the forward primer sequence that you used shows that its an illumina adapter. However, this step expects the primer that were used in the amplicon PCR.

Essentially: use the correct primer sequences.

By the way, this problem (ambiguous and difficult to interpret error message) should be solved in the dev branch and will be released eventually.

Edit: Let me know if that solves your problem!

Edit2: Just echecked whether its in the documentation, and it is, see https://nf-co.re/ampliseq/2.1.1/parameters#fw_primer

In amplicon sequencing methods, PCR with specific primers produces the amplicon of interest. These primer sequences need to be trimmed from the reads before further processing and are also required for producing an appropriate classifier. Do not use here any technical sequence such as adapter sequences but only the primer sequence that matches the biological amplicon.

avinash-ngc commented 2 years ago

Thanks a lot for the response @d4straub . You are a savoir. I am facing some issues while running the ampliseq pipeline. I am sure they are due to some dumb mistakes of mine but as i am new to this field, i am facing some issues here. So the pipeline encountered an error most probably in the qiime2 module and was unable to generate alpha and beta diversity plots. I am attaching my log file along with the metadata sheet. Hope you can help me out with it.

The link - https://drive.google.com/drive/folders/1M2ZIl0nwD6kKSwu3GQx562cUys7OB9qO?usp=sharing

Thanks Again

d4straub commented 2 years ago

Hi, the respective error message is

Jan-22 02:23:16.998 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_ANCOM:QIIME2_FILTERASV (body.site,year,month,day)'

Caused by:
  Process `NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_ANCOM:QIIME2_FILTERASV (body.site,year,month,day)` terminated with an error exit status (1)

Command executed:

  export XDG_CONFIG_HOME="${PWD}/HOME"

  IFS=',' read -r -a metacategory <<< "body.site,year,month,day"

  #remove samples that do not have any value
  for j in "${metacategory[@]}"
  do
      qiime feature-table filter-samples                 --i-table filtered-table.qza                 --m-metadata-file metadata.tsv                 --p-where "$j<>''"                 --o-filtered-table $j.qza
  done

  echo $(qiime --version | sed -e "s/q2cli version //g" | tr -d '`' | sed -e "s/Run qiime info for more version details.//g") > qiime2.version.txt

Command exit status:
  1

Command output:
  (empty)

Command error:
  QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
  Plugin error from feature-table:

    Selection of IDs failed with query:
     SELECT "sample-id" FROM metadata WHERE body.site<>'' GROUP BY "sample-id" ORDER BY "sample-id";

    If one of the metadata column names specified in the `where` statement is on this list of reserved keywords (http://www.sqlite.org/lang_keywords.html), please ensure it is quoted appropriately in the `where` statement.

  Debug info has been saved to /tmp/qiime2-q2cli-err-7v_sgnfi.log

The error message actually is enigmatic and I am not certain what the problem is. However, I guess the metadata file is the problem. First of all, the category body-site has only .soil values, therefore it should not appear in that list at all. Second, the second line in metadata.tsv seems not a good choice, it does not relate to any sample.

So my idea: Remove the second line of the metadata sheet, which is #q2:types categorical numeric numeric numeric, I assume that solves it. If it doesn't work after the change above, also rename the first column (currently sample-id) to something simpler, e.g. ID. My reasoning here is that the - might be converted to something else erroneously, but I doubt it. Do not forget appending -resume to your pipeline run command so that you do not start over from the beginning but only re-do steps that are affected by the metadata change.

EDIT: Forgot to mention that column headers should be always as simple as possible, i.e. if possible letters only, numeric works usually as well (but not a header starting with a number) because whenever steps with R are used than those non-alphabetic & non-numeric elements are frequently converted to . or similar an than it doesnt fit any more to the source file. So I suggest to simplify column names, in that case sample-id & body-site

avinash-ngc commented 2 years ago

Dear Daniel,

Thanks again for all the help. With the suggested changes, i was able to execute most of the steps but the pipeline failed in the end and was recorded in the attached file. Kindly have a look at it.

Thanks again for all the help. Regards

On Mon, Jan 24, 2022 at 2:24 PM Daniel Straub @.***> wrote:

Hi, the respective error message is

Jan-22 02:23:16.998 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_ANCOM:QIIME2_FILTERASV (body.site,year,month,day)'

Caused by: Process NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_ANCOM:QIIME2_FILTERASV (body.site,year,month,day) terminated with an error exit status (1)

Command executed:

export XDG_CONFIG_HOME="${PWD}/HOME"

IFS=',' read -r -a metacategory <<< "body.site,year,month,day"

remove samples that do not have any value

for j in "${metacategory[@]}" do qiime feature-table filter-samples --i-table filtered-table.qza --m-metadata-file metadata.tsv --p-where "$j<>''" --o-filtered-table $j.qza done

echo $(qiime --version | sed -e "s/q2cli version //g" | tr -d '`' | sed -e "s/Run qiime info for more version details.//g") > qiime2.version.txt

Command exit status: 1

Command output: (empty)

Command error: QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment. Plugin error from feature-table:

Selection of IDs failed with query:
 SELECT "sample-id" FROM metadata WHERE body.site<>'' GROUP BY "sample-id" ORDER BY "sample-id";

If one of the metadata column names specified in the `where` statement is on this list of reserved keywords (http://www.sqlite.org/lang_keywords.html), please ensure it is quoted appropriately in the `where` statement.

Debug info has been saved to /tmp/qiime2-q2cli-err-7v_sgnfi.log

The error message actually is enigmatic and I am not certain what the problem is. However, I guess the metadata file is the problem. First of all, the category body-site has only .soil values, therefore it should not appear in that list at all. Second, the second line in metadata.tsv seems not a good choice, it does not relate to any sample.

So my idea: Remove the second line of the metadata sheet, which is #q2:types categorical numeric numeric numeric, I assume that solves it. If it doesn't work after the change above, also rename the first column (currently sample-id) to something simpler, e.g. ID. My reasoning here is that the - might be converted to something else erroneously, but I doubt it. Do not forget appending -resume to your pipeline run command so that you do not start over from the beginning but only re-do steps that are affected by the metadata change.

— Reply to this email directly, view it on GitHub https://github.com/nf-core/ampliseq/issues/368#issuecomment-1019859175, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUOA3W3NK3XEILMK4MI7YG3UXUHUXANCNFSM5MPHIY7Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

-- With regards Avinash Dhar 8130036023

Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_ALPHARAREFACTION (1)'

Caused by: Process exceeded running time limit (6h)

Command executed:

export XDG_CONFIG_HOME="${PWD}/HOME"

maxdepth=$(count_table_minmax_reads.py filtered-table.tsv maximum 2>&1)

check values

if [ "$maxdepth" -gt "75000" ]; then maxdepth="75000"; fi if [ "$maxdepth" -gt "5000" ]; then maxsteps="250"; else maxsteps=$((maxdepth/20)); fi qiime diversity alpha-rarefaction --i-table filtered-table.qza --i-phylogeny rooted-tree.qza --p-max-depth $maxdepth --m-metadata-file metadata_2.tsv --p-steps $maxsteps --p-iterations 10 --o-visualization alpha-rarefaction.qzv qiime tools export --input-path alpha-rarefaction.qzv --output-path alpha-rarefaction

echo $(qiime --version | sed -e "s/q2cli version //g" | tr -d '`' | sed -e "s/Run qiime info for more version details.//g") > qiime2.version.txt

Command exit status:

Command output: (empty)

Command error: QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.

Work dir: /data/NGC_Data/Analysis/External_Projects/2022/crida/work/39/e49886ac4935ff85d435a348bf33eb

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

d4straub commented 2 years ago

I cannot see any attachment.

avinash-ngc commented 2 years ago

Dear Daniel,

The link has the error as well as the nextflow log attached. Thanks again for all the help.

https://drive.google.com/drive/folders/1M2ZIl0nwD6kKSwu3GQx562cUys7OB9qO?usp=sharing

d4straub commented 2 years ago

As above:

Forgot to mention that column headers should be always as simple as possible, i.e. if possible letters only, numeric works usually as well (but not a header starting with a number) because whenever steps with R are used than those non-alphabetic & non-numeric elements are frequently converted to . or similar an than it doesnt fit any more to the source file. So I suggest to simplify column names, in that case sample-id & body-site

body-site seems to be converted to body.site and that offends QIIME2, I think. Solution: rename the column body-site to bodysite and re-run (-resume) the pipeline.

avinash-ngc commented 2 years ago

Dear Daniel,

Thanks a lot for the continuous support. It has really helped a lot. I have been running through a problem at the alpha rarefaction stage and the error looks something like this -

Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_ALPHARAREFACTION (1)'

Caused by: Process exceeded running time limit (6h)

Command executed:

export XDG_CONFIG_HOME="${PWD}/HOME"

maxdepth=$(count_table_minmax_reads.py filtered-table.tsv maximum 2>&1)

check values

if [ "$maxdepth" -gt "75000" ]; then maxdepth="75000"; fi if [ "$maxdepth" -gt "5000" ]; then maxsteps="250"; else maxsteps=$((maxdepth/20)); fi qiime diversity alpha-rarefaction --i-table filtered-table.qza --i-phylogeny rooted-tree.qza --p-max-depth $maxdepth --m-metadata-file metadata_sunflower.txt --p-steps $maxsteps --p-iterations 10 --o-visualization alpha-rarefaction.qzv qiime tools export --input-path alpha-rarefaction.qzv --output-path alpha-rarefaction

echo $(qiime --version | sed -e "s/q2cli version //g" | tr -d '`' | sed -e "s/Run qiime info for more version details.//g") > qiime2.version.txt

Command exit status:

Command output: (empty)

Command error: QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.

Work dir:

/data/NGC_Data/Analysis/External_Projects/2022/crida/sunflower/work/e8/171e7561a73b506e39c6b91dfa92f3

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

Thanks in advance

Avinash Dhar

On Fri, Jan 28, 2022 at 1:52 PM Daniel Straub @.***> wrote:

As above:

Forgot to mention that column headers should be always as simple as possible, i.e. if possible letters only, numeric works usually as well (but not a header starting with a number) because whenever steps with R are used than those non-alphabetic & non-numeric elements are frequently converted to . or similar an than it doesnt fit any more to the source file. So I suggest to simplify column names, in that case sample-id & body-site

body-site seems to be converted to body.site and that offends QIIME2, I think. Solution: rename the column body-site to bodysite and re-run ( -resume) the pipeline.

— Reply to this email directly, view it on GitHub https://github.com/nf-core/ampliseq/issues/368#issuecomment-1023986556, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUOA3W73KSTJYVCHQZ4DYQ3UYJG2VANCNFSM5MPHIY7Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

-- With regards Avinash Dhar 8130036023

d4straub commented 2 years ago

Hi there, the error message reports

Caused by: Process exceeded running time limit (6h)

The process needs longer than anticipated for your samples. You can solve this by allowing the process more runtime. To do this, create a file named QIIME2_ALPHARAREFACTION.config that contains:

process {
    withName:QIIME2_ALPHARAREFACTION {
        time   = 24.h
    }
}

and then resume the previous pipeline run by adding to the command: -resume -c QIIME2_ALPHARAREFACTION.config The process now has 24 hours instead of 6h to complete, that should be enough. You can give the process up to 596 hours of runtime that way.

avinash-ngc commented 2 years ago

Hi Daniel Thanks a lot for all the help. Really appreciate it. All your suggestions have mostly sorted all my queries. However this is one final query that I wanted some clarification on. How does one provide replicates to the pipeline. I mean if I need mean values for the group, how am I supposed to go ahead with the same.

Thanks again for all the help

Thanks and Regards Avinash Dhar

On Mon, Feb 14, 2022 at 1:37 PM Daniel Straub @.***> wrote:

Hi there, the error message reports

Caused by: Process exceeded running time limit (6h)

The process needs longer than anticipated for your samples. You can solve this by allowing the process more runtime. To do this, create a file named QIIME2_ALPHARAREFACTION.config that contains:

process { withName:QIIME2_ALPHARAREFACTION { time = 24.h } }

and then resume the previous pipeline run by adding to the command: -resume -c QIIME2_ALPHARAREFACTION.config The process now has 24 hours instead of 6h to complete, that should be enough. You can give the process up to 596 hours of runtime that way.

— Reply to this email directly, view it on GitHub https://github.com/nf-core/ampliseq/issues/368#issuecomment-1038772994, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUOA3W24KMBVK4FODLZ6H3LU3CZ27ANCNFSM5MPHIY7Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

-- With regards Avinash Dhar 8130036023

d4straub commented 2 years ago

Information about replicates are used frequently in the pipeline (originating from the metadata file), for example in alpha and beta diversity analysis, ANCOM, etc., i.e. where significant differences are tested for. However, there is no table with mean values of relative abundance for replicates, if that is what you are asking for. That needs to be done manually, i.e. outside of the pipeline.

I agree that relative abundance mean values per group are interesting sometimes. However, it is hard to implement an intuitive, user friendly but also flexible and useful way to use metadata in a pipeline (I have solved it partially for the before mentioned statistics, probably that would do it...). I will think about it but I cannot promise that this will be in the pipeline soon.

d4straub commented 2 years ago

I have opened https://github.com/nf-core/ampliseq/issues/393 to remember the mean values for the group, because that seems a relatively common way of looking at data.

I close that issue now, open another one in case you have other issues.

avinash-ngc commented 2 years ago

Hi Daniel,

Here i am with another query regarding the pipeline. I am attaching the error message that I encountered while running the pipeline at the Denoising and Remove Chimera step. The error message isn't clear enough. My fastq files seem to be ok as well. I would be really obliged if you could have a look into this issue.

Thanks and Regards Avinash Dhar

On Tue, Mar 8, 2022 at 6:02 PM Daniel Straub @.***> wrote:

Closed #368 https://github.com/nf-core/ampliseq/issues/368.

— Reply to this email directly, view it on GitHub https://github.com/nf-core/ampliseq/issues/368#event-6202152195, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUOA3W5W47RFHH3LZUIS5NLU65CFNANCNFSM5MPHIY7Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

-- With regards Avinash Dhar 8130036023

Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:DADA2_RMCHIMERA (1)'

Caused by: Process NFCORE_AMPLISEQ:AMPLISEQ:DADA2_RMCHIMERA (1) terminated with an error exit status (1)

Command executed:

!/usr/bin/env Rscript

suppressPackageStartupMessages(library(dada2))

seqtab = readRDS("1.seqtab.rds")

remove chimera

seqtab.nochim <- removeBimeraDenovo(seqtab, method="consensus", minSampleFraction = 0.9, ignoreNNegatives = 1, minFoldParentOverAbundance = 2, minParentAbundance = 8, allowOneOff = FALSE, minOneOffParentDistance = 4, maxShift = 16, multithread=6, verbose=TRUE) if ( 1 == 1 ) { rownames(seqtab.nochim) <- "AIG93" } saveRDS(seqtab.nochim,"1.ASVtable.rds")

write.table('removeBimeraDenovo method="consensus", minSampleFraction = 0.9, ignoreNNegatives = 1, minFoldParentOverAbundance = 2, minParentAbundance = 8, allowOneOff = FALSE, minOneOffParentDistance = 4, maxShift = 16', file = "removeBimeraDenovo.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '') writeLines(c("\"NFCORE_AMPLISEQ:AMPLISEQ:DADA2_RMCHIMERA\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" dada2: ", packageVersion("dada2")) ), "versions.yml")

Command exit status: 1

Command output: (empty)

Command error: Error in isBimeraDenovoTable(unqs[[i]], ..., verbose = verbose) : Input must be a valid sequence table. Calls: removeBimeraDenovo -> isBimeraDenovoTable Execution halted

Work dir: /data/NGC_Data/Analysis/External_Projects/2022/AIG-93-94/93/93/work/84/824a1fd58e467ddfe13c0d6a34f070

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

d4straub commented 2 years ago

@avinash-ngc please open new issues for new problems arising or join the nf-core slack. I havent seen your most recent post here until now. Problem seems to be that no sequences make it to that step. Most likely a problem of your data or possibly of the pipeline settings, but most likely not a problem caused by the pipeline.