Closed avinash-ngc closed 2 years ago
Hi, could you please provide some more details, command you used to start the pipeline, nextflow version, pipeline version, and roughly how many samples you have? The latter is because I think I have seen a similar problem once with very long reads, cannot find the issue right now.
The error ouput is not really helping at all, could you attach the .nextflow.log
file (in that file the above info should be already provided)?
Thanks for the prompt reply. The read length is around 251bp and the sequencing chemistry is paired end. There are 60 samples and the google drive link for the log file has been provided below. The link - https://drive.google.com/file/d/1WCmR4IlP6H-4c_eK6JqE1lH7Mfqr3rLv/view?usp=sharing Thanks once again
Thanks for the log file. Above you left out the most important part of the error message:
Command error:
Error in data.frame(sequence = names(freqtbl$top), count = as.integer(freqtbl$top), :
arguments imply differing number of rows: 0, 1
Calls: plotQualityProfile ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
Execution halted
That means that the input seems to be an empty table. So you lost all your reads before that step. The input to that step are the cutadapt outputs, that remove the primer sequences. That means that your primer sequences were not matching the sequencing reads. Several possibilities:
I assume the latter because a quick googling of the forward primer sequence that you used shows that its an illumina adapter. However, this step expects the primer that were used in the amplicon PCR.
Essentially: use the correct primer sequences.
By the way, this problem (ambiguous and difficult to interpret error message) should be solved in the dev branch and will be released eventually.
Edit: Let me know if that solves your problem!
Edit2: Just echecked whether its in the documentation, and it is, see https://nf-co.re/ampliseq/2.1.1/parameters#fw_primer
In amplicon sequencing methods, PCR with specific primers produces the amplicon of interest. These primer sequences need to be trimmed from the reads before further processing and are also required for producing an appropriate classifier. Do not use here any technical sequence such as adapter sequences but only the primer sequence that matches the biological amplicon.
Thanks a lot for the response @d4straub . You are a savoir. I am facing some issues while running the ampliseq pipeline. I am sure they are due to some dumb mistakes of mine but as i am new to this field, i am facing some issues here. So the pipeline encountered an error most probably in the qiime2 module and was unable to generate alpha and beta diversity plots. I am attaching my log file along with the metadata sheet. Hope you can help me out with it.
The link - https://drive.google.com/drive/folders/1M2ZIl0nwD6kKSwu3GQx562cUys7OB9qO?usp=sharing
Thanks Again
Hi, the respective error message is
Jan-22 02:23:16.998 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_ANCOM:QIIME2_FILTERASV (body.site,year,month,day)'
Caused by:
Process `NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_ANCOM:QIIME2_FILTERASV (body.site,year,month,day)` terminated with an error exit status (1)
Command executed:
export XDG_CONFIG_HOME="${PWD}/HOME"
IFS=',' read -r -a metacategory <<< "body.site,year,month,day"
#remove samples that do not have any value
for j in "${metacategory[@]}"
do
qiime feature-table filter-samples --i-table filtered-table.qza --m-metadata-file metadata.tsv --p-where "$j<>''" --o-filtered-table $j.qza
done
echo $(qiime --version | sed -e "s/q2cli version //g" | tr -d '`' | sed -e "s/Run qiime info for more version details.//g") > qiime2.version.txt
Command exit status:
1
Command output:
(empty)
Command error:
QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
Plugin error from feature-table:
Selection of IDs failed with query:
SELECT "sample-id" FROM metadata WHERE body.site<>'' GROUP BY "sample-id" ORDER BY "sample-id";
If one of the metadata column names specified in the `where` statement is on this list of reserved keywords (http://www.sqlite.org/lang_keywords.html), please ensure it is quoted appropriately in the `where` statement.
Debug info has been saved to /tmp/qiime2-q2cli-err-7v_sgnfi.log
The error message actually is enigmatic and I am not certain what the problem is.
However, I guess the metadata file is the problem. First of all, the category body-site
has only .soil
values, therefore it should not appear in that list at all. Second, the second line in metadata.tsv
seems not a good choice, it does not relate to any sample.
So my idea:
Remove the second line of the metadata sheet, which is #q2:types categorical numeric numeric numeric
, I assume that solves it.
If it doesn't work after the change above, also rename the first column (currently sample-id
) to something simpler, e.g. ID
. My reasoning here is that the -
might be converted to something else erroneously, but I doubt it.
Do not forget appending -resume
to your pipeline run command so that you do not start over from the beginning but only re-do steps that are affected by the metadata change.
EDIT: Forgot to mention that column headers should be always as simple as possible, i.e. if possible letters only, numeric works usually as well (but not a header starting with a number) because whenever steps with R are used than those non-alphabetic & non-numeric elements are frequently converted to .
or similar an than it doesnt fit any more to the source file. So I suggest to simplify column names, in that case sample-id
& body-site
Dear Daniel,
Thanks again for all the help. With the suggested changes, i was able to execute most of the steps but the pipeline failed in the end and was recorded in the attached file. Kindly have a look at it.
Thanks again for all the help. Regards
On Mon, Jan 24, 2022 at 2:24 PM Daniel Straub @.***> wrote:
Hi, the respective error message is
Jan-22 02:23:16.998 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_ANCOM:QIIME2_FILTERASV (body.site,year,month,day)'
Caused by: Process
NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_ANCOM:QIIME2_FILTERASV (body.site,year,month,day)
terminated with an error exit status (1)Command executed:
export XDG_CONFIG_HOME="${PWD}/HOME"
IFS=',' read -r -a metacategory <<< "body.site,year,month,day"
remove samples that do not have any value
for j in "${metacategory[@]}" do qiime feature-table filter-samples --i-table filtered-table.qza --m-metadata-file metadata.tsv --p-where "$j<>''" --o-filtered-table $j.qza done
echo $(qiime --version | sed -e "s/q2cli version //g" | tr -d '`' | sed -e "s/Run qiime info for more version details.//g") > qiime2.version.txt
Command exit status: 1
Command output: (empty)
Command error: QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment. Plugin error from feature-table:
Selection of IDs failed with query: SELECT "sample-id" FROM metadata WHERE body.site<>'' GROUP BY "sample-id" ORDER BY "sample-id"; If one of the metadata column names specified in the `where` statement is on this list of reserved keywords (http://www.sqlite.org/lang_keywords.html), please ensure it is quoted appropriately in the `where` statement.
Debug info has been saved to /tmp/qiime2-q2cli-err-7v_sgnfi.log
The error message actually is enigmatic and I am not certain what the problem is. However, I guess the metadata file is the problem. First of all, the category body-site has only .soil values, therefore it should not appear in that list at all. Second, the second line in metadata.tsv seems not a good choice, it does not relate to any sample.
So my idea: Remove the second line of the metadata sheet, which is #q2:types categorical numeric numeric numeric, I assume that solves it. If it doesn't work after the change above, also rename the first column (currently sample-id) to something simpler, e.g. ID. My reasoning here is that the - might be converted to something else erroneously, but I doubt it. Do not forget appending -resume to your pipeline run command so that you do not start over from the beginning but only re-do steps that are affected by the metadata change.
— Reply to this email directly, view it on GitHub https://github.com/nf-core/ampliseq/issues/368#issuecomment-1019859175, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUOA3W3NK3XEILMK4MI7YG3UXUHUXANCNFSM5MPHIY7Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
-- With regards Avinash Dhar 8130036023
Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_ALPHARAREFACTION (1)'
Caused by: Process exceeded running time limit (6h)
Command executed:
export XDG_CONFIG_HOME="${PWD}/HOME"
maxdepth=$(count_table_minmax_reads.py filtered-table.tsv maximum 2>&1)
if [ "$maxdepth" -gt "75000" ]; then maxdepth="75000"; fi if [ "$maxdepth" -gt "5000" ]; then maxsteps="250"; else maxsteps=$((maxdepth/20)); fi qiime diversity alpha-rarefaction --i-table filtered-table.qza --i-phylogeny rooted-tree.qza --p-max-depth $maxdepth --m-metadata-file metadata_2.tsv --p-steps $maxsteps --p-iterations 10 --o-visualization alpha-rarefaction.qzv qiime tools export --input-path alpha-rarefaction.qzv --output-path alpha-rarefaction
echo $(qiime --version | sed -e "s/q2cli version //g" | tr -d '`' | sed -e "s/Run qiime info for more version details.//g") > qiime2.version.txt
Command output: (empty)
Command error: QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
Work dir: /data/NGC_Data/Analysis/External_Projects/2022/crida/work/39/e49886ac4935ff85d435a348bf33eb
Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out
I cannot see any attachment.
Dear Daniel,
The link has the error as well as the nextflow log attached. Thanks again for all the help.
https://drive.google.com/drive/folders/1M2ZIl0nwD6kKSwu3GQx562cUys7OB9qO?usp=sharing
As above:
Forgot to mention that column headers should be always as simple as possible, i.e. if possible letters only, numeric works usually as well (but not a header starting with a number) because whenever steps with R are used than those non-alphabetic & non-numeric elements are frequently converted to . or similar an than it doesnt fit any more to the source file. So I suggest to simplify column names, in that case sample-id & body-site
body-site
seems to be converted to body.site
and that offends QIIME2, I think. Solution: rename the column body-site
to bodysite
and re-run (-resume
) the pipeline.
Dear Daniel,
Thanks a lot for the continuous support. It has really helped a lot. I have been running through a problem at the alpha rarefaction stage and the error looks something like this -
Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_ALPHARAREFACTION (1)'
Caused by: Process exceeded running time limit (6h)
Command executed:
export XDG_CONFIG_HOME="${PWD}/HOME"
maxdepth=$(count_table_minmax_reads.py filtered-table.tsv maximum 2>&1)
if [ "$maxdepth" -gt "75000" ]; then maxdepth="75000"; fi if [ "$maxdepth" -gt "5000" ]; then maxsteps="250"; else maxsteps=$((maxdepth/20)); fi qiime diversity alpha-rarefaction --i-table filtered-table.qza --i-phylogeny rooted-tree.qza --p-max-depth $maxdepth --m-metadata-file metadata_sunflower.txt --p-steps $maxsteps --p-iterations 10 --o-visualization alpha-rarefaction.qzv qiime tools export --input-path alpha-rarefaction.qzv --output-path alpha-rarefaction
echo $(qiime --version | sed -e "s/q2cli version //g" | tr -d '`' | sed -e "s/Run qiime info for more version details.//g") > qiime2.version.txt
Command output: (empty)
Command error: QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
Work dir:
/data/NGC_Data/Analysis/External_Projects/2022/crida/sunflower/work/e8/171e7561a73b506e39c6b91dfa92f3
Tip: you can try to figure out what's wrong by changing to the process work
dir and showing the script file named .command.sh
Thanks in advance
Avinash Dhar
On Fri, Jan 28, 2022 at 1:52 PM Daniel Straub @.***> wrote:
As above:
Forgot to mention that column headers should be always as simple as possible, i.e. if possible letters only, numeric works usually as well (but not a header starting with a number) because whenever steps with R are used than those non-alphabetic & non-numeric elements are frequently converted to . or similar an than it doesnt fit any more to the source file. So I suggest to simplify column names, in that case sample-id & body-site
body-site seems to be converted to body.site and that offends QIIME2, I think. Solution: rename the column body-site to bodysite and re-run ( -resume) the pipeline.
— Reply to this email directly, view it on GitHub https://github.com/nf-core/ampliseq/issues/368#issuecomment-1023986556, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUOA3W73KSTJYVCHQZ4DYQ3UYJG2VANCNFSM5MPHIY7Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
-- With regards Avinash Dhar 8130036023
Hi there, the error message reports
Caused by: Process exceeded running time limit (6h)
The process needs longer than anticipated for your samples. You can solve this by allowing the process more runtime.
To do this, create a file named QIIME2_ALPHARAREFACTION.config
that contains:
process {
withName:QIIME2_ALPHARAREFACTION {
time = 24.h
}
}
and then resume the previous pipeline run by adding to the command: -resume -c QIIME2_ALPHARAREFACTION.config
The process now has 24 hours instead of 6h to complete, that should be enough. You can give the process up to 596 hours of runtime that way.
Hi Daniel Thanks a lot for all the help. Really appreciate it. All your suggestions have mostly sorted all my queries. However this is one final query that I wanted some clarification on. How does one provide replicates to the pipeline. I mean if I need mean values for the group, how am I supposed to go ahead with the same.
Thanks again for all the help
Thanks and Regards Avinash Dhar
On Mon, Feb 14, 2022 at 1:37 PM Daniel Straub @.***> wrote:
Hi there, the error message reports
Caused by: Process exceeded running time limit (6h)
The process needs longer than anticipated for your samples. You can solve this by allowing the process more runtime. To do this, create a file named QIIME2_ALPHARAREFACTION.config that contains:
process { withName:QIIME2_ALPHARAREFACTION { time = 24.h } }
and then resume the previous pipeline run by adding to the command: -resume -c QIIME2_ALPHARAREFACTION.config The process now has 24 hours instead of 6h to complete, that should be enough. You can give the process up to 596 hours of runtime that way.
— Reply to this email directly, view it on GitHub https://github.com/nf-core/ampliseq/issues/368#issuecomment-1038772994, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUOA3W24KMBVK4FODLZ6H3LU3CZ27ANCNFSM5MPHIY7Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
-- With regards Avinash Dhar 8130036023
Information about replicates are used frequently in the pipeline (originating from the metadata file), for example in alpha and beta diversity analysis, ANCOM, etc., i.e. where significant differences are tested for. However, there is no table with mean values of relative abundance for replicates, if that is what you are asking for. That needs to be done manually, i.e. outside of the pipeline.
I agree that relative abundance mean values per group are interesting sometimes. However, it is hard to implement an intuitive, user friendly but also flexible and useful way to use metadata in a pipeline (I have solved it partially for the before mentioned statistics, probably that would do it...). I will think about it but I cannot promise that this will be in the pipeline soon.
I have opened https://github.com/nf-core/ampliseq/issues/393 to remember the mean values for the group
, because that seems a relatively common way of looking at data.
I close that issue now, open another one in case you have other issues.
Hi Daniel,
Here i am with another query regarding the pipeline. I am attaching the error message that I encountered while running the pipeline at the Denoising and Remove Chimera step. The error message isn't clear enough. My fastq files seem to be ok as well. I would be really obliged if you could have a look into this issue.
Thanks and Regards Avinash Dhar
On Tue, Mar 8, 2022 at 6:02 PM Daniel Straub @.***> wrote:
Closed #368 https://github.com/nf-core/ampliseq/issues/368.
— Reply to this email directly, view it on GitHub https://github.com/nf-core/ampliseq/issues/368#event-6202152195, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUOA3W5W47RFHH3LZUIS5NLU65CFNANCNFSM5MPHIY7Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
-- With regards Avinash Dhar 8130036023
Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:DADA2_RMCHIMERA (1)'
Caused by:
Process NFCORE_AMPLISEQ:AMPLISEQ:DADA2_RMCHIMERA (1)
terminated with an error exit status (1)
Command executed:
suppressPackageStartupMessages(library(dada2))
seqtab = readRDS("1.seqtab.rds")
seqtab.nochim <- removeBimeraDenovo(seqtab, method="consensus", minSampleFraction = 0.9, ignoreNNegatives = 1, minFoldParentOverAbundance = 2, minParentAbundance = 8, allowOneOff = FALSE, minOneOffParentDistance = 4, maxShift = 16, multithread=6, verbose=TRUE) if ( 1 == 1 ) { rownames(seqtab.nochim) <- "AIG93" } saveRDS(seqtab.nochim,"1.ASVtable.rds")
write.table('removeBimeraDenovo method="consensus", minSampleFraction = 0.9, ignoreNNegatives = 1, minFoldParentOverAbundance = 2, minParentAbundance = 8, allowOneOff = FALSE, minOneOffParentDistance = 4, maxShift = 16', file = "removeBimeraDenovo.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '') writeLines(c("\"NFCORE_AMPLISEQ:AMPLISEQ:DADA2_RMCHIMERA\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" dada2: ", packageVersion("dada2")) ), "versions.yml")
Command exit status: 1
Command output: (empty)
Command error: Error in isBimeraDenovoTable(unqs[[i]], ..., verbose = verbose) : Input must be a valid sequence table. Calls: removeBimeraDenovo -> isBimeraDenovoTable Execution halted
Work dir: /data/NGC_Data/Analysis/External_Projects/2022/AIG-93-94/93/93/work/84/824a1fd58e467ddfe13c0d6a34f070
Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out
@avinash-ngc please open new issues for new problems arising or join the nf-core slack. I havent seen your most recent post here until now. Problem seems to be that no sequences make it to that step. Most likely a problem of your data or possibly of the pipeline settings, but most likely not a problem caused by the pipeline.
Hii There ...
I am continuously facing an issue in the ampliseq pipeline at 'NFCORE_AMPLISEQ:AMPLISEQ:DADA2_QUALITY (FW) stage. The error has been pasted below.
" Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:DADA2_QUALITY (FW)'
Caused by: Process
NFCORE_AMPLISEQ:AMPLISEQ:DADA2_QUALITY (FW)
terminated with an error exit status (1)Command executed:
dada_quality.r "FW_qual_stats" 5e+06 echo 'plotQualityProfile 5e+06' > "plotQualityProfile.args.txt"
Command exit status: 1
Command output: [1] "FW_qual_stats" [1] 5000000 "
Thanks a lot