Closed savvas-paragkamian closed 11 months ago
Hi @savvas-paragkamian and thanks for sharing.
I am a bit confused though, have you tried using clusteringAlgo algo_Swarm
as suggested ?
If yes, do you also get an error then ?
Yes I write clusteringAlgo algo_swarm
, but it is suggested to write clusteringAlgo algo_Swarm
.
In the initialize script also it is written:
} else if ( params{'clusteringAlgo'} == 'algo_Swarm' ) {
string algo = 'Swarm'
algo.mkdir()
}
Maybe that's why the folder wasn't created?
Currently, I run the analysis from the dereplication step checkpoint. I have manually created both folders (777 permissions) i.e. Swarm
, swarm
in the directory 7.mainOutput/gene_16
.
In addition I changed the parameters file to algo_Swarm
.
My question is, if I change the parameters file and continue the analysis from the checkpoint, PEMA reads the new changes of the parameters?
Let's break this down to steps! :wink:
so, first, could you please give it a shot with just a few numbers of samples (2-3)
setting the clusteringAlgo
to algo_Swarm
?
If you have already tried so, did you have an error ?
if I change the parameters file and continue the analysis from the checkpoint, PEMA reads the new changes of the parameters?
in the beginning of each checkpoint, pema reads the parameters file thanks to the readParameterFile()
function
The new job with 4 samples and with the correct algo_Swarm
parameter works!
The job from the checkpoint failed with the following error:
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
/home/scripts/dereplicateSwarm.sh: line 99: 29689 Killed awk 'BEGIN {FS = "[>_]"}
# Parse the sample files
/^>/ {contingency[$2][FILENAME] = $3
amplicons[$2] += $3
if (FNR == 1) {
samples[++i] = FILENAME
}
}
END {# Create table header
printf "amplicon"
s = length(samples)
for (i = 1; i <= s; i++) {
printf "\t%s", samples[i]
}
printf "\t%s\n", "total"
# Sort amplicons by decreasing total abundance (use a coprocess)
command = "LC_ALL=C sort -k1,1nr -k2,2d"
for (amplicon in amplicons) {
printf "%d\t%s\n", amplicons[amplicon], amplicon |& command
}
close(command, "to")
FS = "\t"
while ((command |& getline) > 0) {
amplicons_sorted[++j] = $2
}
close(command)
# Print the amplicon occurrences in the different samples
n = length(amplicons_sorted)
for (i = 1; i <= n; i++) {
amplicon = amplicons_sorted[i]
printf "%s", amplicon
for (j = 1; j <= s; j++) {
printf "\t%d", contingency[amplicon][samples[j]]
}
printf "\t%d\n", amplicons[amplicon]
}}' linearized.dereplicate* > ../amplicon_contingency_table.tsv
Fatal error: /home/modules/preprocess.bds, line 449, pos 5. Exec failed.
Exit value : 137
Command : bash /home/scripts/dereplicateSwarm.sh
pema_latest.bds, line 103 : if ( paramsDereplication{'clusteringAlgo'} == 'algo_Swarm' ) {
pema_latest.bds, line 106 : swarmDereplicate(paramsDereplication, globalVars)
preprocess.bds, line 445 : string swarmDereplicate(string{} params, string{} globalVars){
preprocess.bds, line 449 : sys bash $globalVars{'path'}/scripts/dereplicateSwarm.sh
Thanks for sharing @savvas-paragkamian . The issue is with the global parametes that are set in the initialization and do not change after each checkpoint.
I ll fix that as part of pema v.2.1.5
and reach back as soon as it's released
Error with the PEMA ASV inference. Possibly due to spelling error.
The line that possibly was skipped because of spelling error in the parameters file.
In the parameters file I write
clusteringAlgo algo_swarm
, while it is suggested to write(write "Swarm" or "vsearch" or "CROP" after algo_).
In the initialize.bds script there is a line that creates the folder
Swarm
.The error
The parameters file: parameters0f.isd_crete_2016_20230823.txt