hildebra / lotus2

Amplicon sequencing pipelines suitable for SSU (16S, 18S), LSU (23S, 28S) and ITS.
http://lotus2.earlham.ac.uk/
GNU General Public License v3.0
52 stars 17 forks source link

analysing only a part of the amplicons #33

Closed splaisan closed 1 year ago

splaisan commented 1 year ago

Sorry for the redundance, this is a shorter report related to https://github.com/hildebra/lotus2/issues/31 I can analyse the amplicons V1-V9 using lotus 2 on the same reads but when I try to get lotus2 look only to the central part (V3-V4) it fails

I was hoping that specifying primer sequences for V3 and V4 (rev-comp) would take care of trimming the first 300bps (until forward primer in V3) and long back tail (after V4 primer) and analyze the resulting 444bps fragment from the 1500bps original amplicon.

The sequences all end up in the (Mid qual) class while for the full amplicon run they were all in (High qual)

Is it possible that the triming of such long ends is not possible using lotus the way I tried?

If not possible, this will also answer ticket #31, and I need to trim my reads externally before running lotus2

Best regards, Stephane

$ mapping_file_V3V4.tsv
#SampleID       fastqFile       ForwardPrimer   ReversePrimer
4170_bc1005--bc1096     4170_bc1005--bc1096.fastq.gz    CCTACGGGNGGCWGCAG       GGATTAGATACCCBDGTAGTC
4356_bc1005--bc1112     4356_bc1005--bc1112.fastq.gz    CCTACGGGNGGCWGCAG       GGATTAGATACCCBDGTAGTC
4285_bc1022--bc1107     4285_bc1022--bc1107.fastq.gz    CCTACGGGNGGCWGCAG       GGATTAGATACCCBDGTAGTC   
4296_bc1022--bc1060     4296_bc1022--bc1060.fastq.gz    CCTACGGGNGGCWGCAG       GGATTAGATACCCBDGTAGTC   
4356_bc1012--bc1098     4356_bc1012--bc1098.fastq.gz    CCTACGGGNGGCWGCAG       GGATTAGATACCCBDGTAGTC   
4112_bc1008--bc1075     4112_bc1008--bc1075.fastq.gz    CCTACGGGNGGCWGCAG       GGATTAGATACCCBDGTAGTC   
4128_bc1005--bc1107     4128_bc1005--bc1107.fastq.gz    CCTACGGGNGGCWGCAG       GGATTAGATACCCBDGTAGTC 

Using Silva SSU ref seq database.
--------------------------------------------------------------------------------
 00:00:00 LotuS 2.23
          COMMAND
          perl /opt/miniconda3/envs/lotus2.23/bin/lotus2 -i /data/analyses/Zymo-SequelIIe-Hifi-V3V4/reads
          -m mapping_file_V3V4.tsv -o lotus2_pacbio_V3V4 -tmp /data/analyses/Zymo-SequelIIe-Hifi-V3V4/tmp
          -s sdm_PacBio_LSSU_V3V4.txt -p PacBio -t 80 -amplicon_type
          SSU -CL cdhit -refDB SLV -taxAligner lambda -useVsearch 1
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
 00:00:00 Reading mapping file
          Sequence files are indicated in mapping file.
--------------------------------------------------------------------------------
------------ I/O configuration --------------
Input       /data/analyses/Zymo-SequelIIe-Hifi-V3V4/reads
Output      lotus2_pacbio_V3V4
SDM options sdm_PacBio_LSSU_V3V4.txt
TempDir     /data/analyses/Zymo-SequelIIe-Hifi-V3V4/tmp
------------ Configuration LotuS --------------
de novo sequence clustering with CD-HIT into OTU's
Sequencing platform     pacbio
Amplicon target         bacteria, SSU
Dereplication filter    0
Clustering algorithm    CD-HIT into OTU's
Read mapping (non tax)  minimap2
OTU nt id      0.97
Precluster read merging No
Ref Chimera checking    Yes (DB=/opt/miniconda3/envs/lotus2.23/share/lotus2-2.23-0//DB//rdp_gold.fa, -chim_skew 2)
deNovo Chimera check    Yes
Tax assignment          Lambda (-LCA_frac 0.8, -LCA_cover 0.5, -LCA_idthresh 97,95,93,91,88,78,0)
ReferenceDatabase       SILVA
RefDB location          /opt/miniconda3/envs/lotus2.23/share/lotus2-2.23-0//DB//SLV_138.1_SSU.fasta
OTU phylogeny           Yes (mafft, fasttree2)
Unclassified OTU's      Kept in matrix
--------------------------------------------
--------------------------------------------------------------------------------
 00:00:00 Demultiplexing, filtering, dereplicating input files, this
          might take some time..
          check progress at lotus2_pacbio_V3V4/LotuSLogS/LotuS_progout.log
 00:00:12 Finished primary read processing with sdm:
          Reads processed: 255,918
          Accepted (High qual): 0 (4,953 end-trimmed)
          Accepted (Mid qual): 252,175
          Rejected: 3,743
          Dereplication block 0: 0 unique sequences (avg size -nan; 0 counts)
          For an extensive report see lotus2_pacbio_V3V4/LotuSLogS//demulti.log
--------------------------------------------------------------------------------
The sdm dereplicated output file was either empty or not existing, aborting lotus.
/data/analyses/Zymo-SequelIIe-Hifi-V3V4/tmp/derep.fas

%@#%@#%@#%@%@#@%#@%#@#%@#%@#%@#@%#@%#@%#@#%@#%@#%@##
      LotuS2 encounterend an error:
The sdm dereplicated output file was either empty or not existing, aborting lotus.
/data/analyses/Zymo-SequelIIe-Hifi-V3V4/tmp/derep.fas

First check if the last error occurred  in a program called by LotuS2 
"tail lotus2_pacbio_V3V4/LotuSLogS/LotuS_progout.log"
, if there is an obvious solution (e.g. external program breaking, this we can't fix). To see (and execute) the last commands by the pipeline, run 
"tail lotus2_pacbio_V3V4/LotuSLogS/LotuS_cmds.log".
In case you decide to contact us on "https://github.com/hildebra/lotus2/", please try to include information from these approaches in your message, this will increase our response time. Thank you.
%@#%@#%@#%@%@#@%#@%#@#%@#%@#%@#@%#@%#@%#@#%@#%@#%@##