Closed stas-malavin closed 8 months ago
Here are my parameters:
outputFolderName Ayalon1_18S_pr2
EnaData Yes
sequencerPrefix V
maxInfo Yes
targetLength 100
strictness 0.8
adapters TruSeq2-PE.fa
seedMismatches 0
palindromeClipThreshold 20
simpleClipThreshold 30
leading 20
trailing 2
minlen 100
threadsTrimmomatic 10
pandaseqAlgorithm simple_bayesian
pandaseqThreads 10
pandaseqMinlen
minoverlap 1
threshold 0.6
elimination
vsearchThreads 20
vsearchId 0.97
gene gene_18S
taxonomyAssignmentMethod alignment
numberOfCoresForPapara 20
referenceDb pr2
taxonomyFolderName my_taxon_assign
forwardITSPrimer GATGAAGAACGYAGYRAA
reverseITSPrimer CTBTTVCCKCTTCACTCG
clusteringAlgo algo_Swarm
d 15
boundary 3
swarmThreads 20
removeSingletons Yes
omp_num_threads 20
abskew 2
midori_version midori_1
custom_ref_db No
name_of_custom_db partialCustomdb
getNCBITaxId Yes
phyloseq No
tree Yes
raxmlThreads 20
parsTrees 10
bootstrapTrees 100
emptyRawDataFile Yes
emptyCheckpoints Yes
classifierAlgo CREST
Hi @stas-malavin. Was this sample among the ones you used in your
Sorry, I don't understand your question
Apologies.
My question was whether the filtered_max_ERR0000001
sample that in your error seems to lead to that error, was also among the samples you had in your small test that performed fine.
My guess here is that maybe something's funny with the BGI format but I do no have experience with that. If you d like to you could send me the two paired read files of that sample to have a look.
How can I run pandaseq-checkid "V350194505L1C001R00100000782/1 BH:ok" using a container?
I am not sure I understand your question here, but if you would like to run pema after first initiating a container in an interactive way, you can always run
singularity exec -B <>:/mnt/analysis pema_v2.1.5.sif bash
cd /home
However, in the Singularity world you cannot edit a script, as it's a read-only environment, you could do this more straightforward in a Docker environment.
However, pandaseq
is already using the -B
flag is that is that you were thinking.
Hope this helps and I d be happy to help more if I can
Hi, filtered_max_ERR0000001
specifically comes from adjustingSequences step of the pipeline (from hammerBayes, I suppose). ERR0000001
, the initial sample, was not among the samples I tested on. It wasn't a test, indeed, it was just another sample, and it went smoothly. It was only one (here in this set I have several), but it probably doesn't matter.
Here are the links to the files (forward and reverse): https://drive.google.com/file/d/185ttx3WXYqlmm7UAJZ1BO_PaKRtkVZD6/view?usp=sharing https://drive.google.com/file/d/15i3habgwlyazpZ3zs2r0NYMnSlY3bVe7/view?usp=sharing
Thanks!
Hi @stas-malavin
I was able to run your sample using pema v.2.1.5
that is about to be released with the fastp
option.
It seems that your BGI reads (I guess they come from a non Illumina platform) come with a format that our initial pipeline cannot deal with for now.
I would suggest you go with the latest version of pema and the fastp
option in the preprocess step.
If you have also Docker on your system, you can pull this by running
docker pull hariszaf/pema:v.2.1.5
if not, please let me know how I could send you the .sif
file of the v.2.1.5
version.
Thanks again and thanks for bringing up the BGI inconsistency.
Hi @hariszaf ,
I've pulled the Docker container.
Could you please specify how do I run it?
docker run --rm -it -v .:/mnt/analysis hariszaf/pema:v.2.1.5
from the manual just opens a shell inside the container. I also tried without -i
, which stands for 'interactive', same result.
You may run
cd /home
./pema_latest.bds
This is still a beta version - I am checking some last combinations but I ran your sample with fastp preprocess, Swarm and CREST and it went smooth. :rocket:
root@a7faceac8cd9:~# cd /home
root@a7faceac8cd9:/home# ./pema_latest.bds
Picked up JAVA_TOOL_OPTIONS: -XX:+UseContainerSupport
Fatal error: modules/initialize.bds, line 34, pos 16. Map 'my_case' does not have key 'preprocess'.
Apologies, I should have mentioned that this version will use an updated parameters file.
Attached the parameters.tsv I used ; github does not support tsv
so just unzip this and get the file from there.
One extra tip, I saw in your parameters that you ask for the NCBI id As you will have a vast number of ASVs, I would leave this part out and I would find another way to get the ncbi ids once the abundance table is ready.
In an old parameters.tsv
you had Swarm's d=15, while the current value is 3. Why such a big difference?..
And also, would appreciate if you share your experience. I've tried Swarm, and it always seemed a bit arbitrary to me, how to select d
. Did you run any tests on that?
Ah, I see, you're now removing oligotons from the occurrence 5
This value has a lot to do with the taxonomic group you are targeting. When using for example COI as your marker gene it's wquite common you go with high values ~12. If you are working with bacteria you d go with lower, even 1.
Did you run any tests on that?
If you would have a mock community, it would be super beneficial for you to set your params.
In the pema publication, i remember we had done some validations with different d values so maybe you could also have a look there, but the best thing is to always have a mock community to drive your parameters. I know this is hard and maybe not always an option, but to my knowledge is the best way to go :)
good point, thanks
how should I set the boundary parameter? It's empty in the current file
Hi @hariszaf , the development version works, thank you.
Hi, I'm getting an error from
pandaseq
when using BGI reads. Before that, I ran a small dataset from the same BGI run (different barcode) successfully. I'm usingsingularity v. 3.8.6
andpema_v.2.1.4.sif
. How can I runpandaseq-checkid "V350194505L1C001R00100000782/1 BH:ok"
using a container? Here's the output: