Sorry for the redundance, this is a shorter report related to https://github.com/hildebra/lotus2/issues/31
I can analyse the amplicons V1-V9 using lotus 2 on the same reads but when I try to get lotus2 look only to the central part (V3-V4) it fails
I was hoping that specifying primer sequences for V3 and V4 (rev-comp) would take care of trimming the first 300bps (until forward primer in V3) and long back tail (after V4 primer) and analyze the resulting 444bps fragment from the 1500bps original amplicon.
The sequences all end up in the (Mid qual) class while for the full amplicon run they were all in (High qual)
Is it possible that the triming of such long ends is not possible using lotus the way I tried?
If not possible, this will also answer ticket #31, and I need to trim my reads externally before running lotus2
Best regards,
Stephane
$ mapping_file_V3V4.tsv
#SampleID fastqFile ForwardPrimer ReversePrimer
4170_bc1005--bc1096 4170_bc1005--bc1096.fastq.gz CCTACGGGNGGCWGCAG GGATTAGATACCCBDGTAGTC
4356_bc1005--bc1112 4356_bc1005--bc1112.fastq.gz CCTACGGGNGGCWGCAG GGATTAGATACCCBDGTAGTC
4285_bc1022--bc1107 4285_bc1022--bc1107.fastq.gz CCTACGGGNGGCWGCAG GGATTAGATACCCBDGTAGTC
4296_bc1022--bc1060 4296_bc1022--bc1060.fastq.gz CCTACGGGNGGCWGCAG GGATTAGATACCCBDGTAGTC
4356_bc1012--bc1098 4356_bc1012--bc1098.fastq.gz CCTACGGGNGGCWGCAG GGATTAGATACCCBDGTAGTC
4112_bc1008--bc1075 4112_bc1008--bc1075.fastq.gz CCTACGGGNGGCWGCAG GGATTAGATACCCBDGTAGTC
4128_bc1005--bc1107 4128_bc1005--bc1107.fastq.gz CCTACGGGNGGCWGCAG GGATTAGATACCCBDGTAGTC
Using Silva SSU ref seq database.
--------------------------------------------------------------------------------
00:00:00 LotuS 2.23
COMMAND
perl /opt/miniconda3/envs/lotus2.23/bin/lotus2 -i /data/analyses/Zymo-SequelIIe-Hifi-V3V4/reads
-m mapping_file_V3V4.tsv -o lotus2_pacbio_V3V4 -tmp /data/analyses/Zymo-SequelIIe-Hifi-V3V4/tmp
-s sdm_PacBio_LSSU_V3V4.txt -p PacBio -t 80 -amplicon_type
SSU -CL cdhit -refDB SLV -taxAligner lambda -useVsearch 1
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
00:00:00 Reading mapping file
Sequence files are indicated in mapping file.
--------------------------------------------------------------------------------
------------ I/O configuration --------------
Input /data/analyses/Zymo-SequelIIe-Hifi-V3V4/reads
Output lotus2_pacbio_V3V4
SDM options sdm_PacBio_LSSU_V3V4.txt
TempDir /data/analyses/Zymo-SequelIIe-Hifi-V3V4/tmp
------------ Configuration LotuS --------------
de novo sequence clustering with CD-HIT into OTU's
Sequencing platform pacbio
Amplicon target bacteria, SSU
Dereplication filter 0
Clustering algorithm CD-HIT into OTU's
Read mapping (non tax) minimap2
OTU nt id 0.97
Precluster read merging No
Ref Chimera checking Yes (DB=/opt/miniconda3/envs/lotus2.23/share/lotus2-2.23-0//DB//rdp_gold.fa, -chim_skew 2)
deNovo Chimera check Yes
Tax assignment Lambda (-LCA_frac 0.8, -LCA_cover 0.5, -LCA_idthresh 97,95,93,91,88,78,0)
ReferenceDatabase SILVA
RefDB location /opt/miniconda3/envs/lotus2.23/share/lotus2-2.23-0//DB//SLV_138.1_SSU.fasta
OTU phylogeny Yes (mafft, fasttree2)
Unclassified OTU's Kept in matrix
--------------------------------------------
--------------------------------------------------------------------------------
00:00:00 Demultiplexing, filtering, dereplicating input files, this
might take some time..
check progress at lotus2_pacbio_V3V4/LotuSLogS/LotuS_progout.log
00:00:12 Finished primary read processing with sdm:
Reads processed: 255,918
Accepted (High qual): 0 (4,953 end-trimmed)
Accepted (Mid qual): 252,175
Rejected: 3,743
Dereplication block 0: 0 unique sequences (avg size -nan; 0 counts)
For an extensive report see lotus2_pacbio_V3V4/LotuSLogS//demulti.log
--------------------------------------------------------------------------------
The sdm dereplicated output file was either empty or not existing, aborting lotus.
/data/analyses/Zymo-SequelIIe-Hifi-V3V4/tmp/derep.fas
%@#%@#%@#%@%@#@%#@%#@#%@#%@#%@#@%#@%#@%#@#%@#%@#%@##
LotuS2 encounterend an error:
The sdm dereplicated output file was either empty or not existing, aborting lotus.
/data/analyses/Zymo-SequelIIe-Hifi-V3V4/tmp/derep.fas
First check if the last error occurred in a program called by LotuS2
"tail lotus2_pacbio_V3V4/LotuSLogS/LotuS_progout.log"
, if there is an obvious solution (e.g. external program breaking, this we can't fix). To see (and execute) the last commands by the pipeline, run
"tail lotus2_pacbio_V3V4/LotuSLogS/LotuS_cmds.log".
In case you decide to contact us on "https://github.com/hildebra/lotus2/", please try to include information from these approaches in your message, this will increase our response time. Thank you.
%@#%@#%@#%@%@#@%#@%#@#%@#%@#%@#@%#@%#@%#@#%@#%@#%@##
Sorry for the redundance, this is a shorter report related to https://github.com/hildebra/lotus2/issues/31 I can analyse the amplicons V1-V9 using lotus 2 on the same reads but when I try to get lotus2 look only to the central part (V3-V4) it fails
I was hoping that specifying primer sequences for V3 and V4 (rev-comp) would take care of trimming the first 300bps (until forward primer in V3) and long back tail (after V4 primer) and analyze the resulting 444bps fragment from the 1500bps original amplicon.
The sequences all end up in the (Mid qual) class while for the full amplicon run they were all in (High qual)
Is it possible that the triming of such long ends is not possible using lotus the way I tried?
If not possible, this will also answer ticket #31, and I need to trim my reads externally before running lotus2
Best regards, Stephane