merenlab / oligotyping

Exploring microbial patterns through subtle nucleotide variation within 16S rRNA gene tag sequences of closely related taxa
GNU General Public License v2.0
40 stars 22 forks source link

Length error #27

Closed jpearl01 closed 6 years ago

jpearl01 commented 6 years ago

Hello,

When running this command, I get this error:

$ decompose BEI_OTU_12_eefilt_subset.fasta 
Config Error: Not all reads have the same length.
$ decompose --version
Oligotyping Pipeline Version .....................................: 2.1

I was under the impression that using the MED pipeline would allow for varying length sequences, as long as they were biologically meaningful?

If I use the flag --skip-check-input-file, I end up with this error:

$ decompose --skip-check-input-file BEI_OTU_12_eefilt_subset.fasta 
Project ..........................................................: BEI_OTU_12_eefilt_subset
Run date .........................................................: 05 Jun 18 17:33:06
Library version ..................................................: 2.1
Command line .....................................................: /home/testing/.local/bin/decompose --skip-check-input-file BEI_OTU_12_eefilt_subset.fasta
Multi-threaded ...................................................: True
Extraction info output file ......................................: /home/testing/BEI_OTU_12_eefilt_subset-m0.10-A0-M0-d4/RUNINFO
Log file path ....................................................: /home/testing/BEI_OTU_12_eefilt_subset-m0.10-A0-M0-d4/RUNINFO.log
Input file .......................................................: BEI_OTU_12_eefilt_subset.fasta
Mapping file .....................................................: None
Quick (and dirty) analysis requested .............................: False
Merge homopolymer splits .........................................: False
Skip removing outliers ...........................................: False
Try to relocate outliers .........................................: False
Store topology dict ..............................................: False
Skip generating figures post analysis ............................: False
Min entropy for a component to be picked for decomposition .......: 0.0965
Perform entropy normalization heuristics .........................: True
Max number of discriminants to use for decomposition .............: 4
Min total abundance of oligotype in all samples ..................: 0
[05 Jun 18 17:33:06 Initializing topology] May take a while depending on the number of reads...                                                                                                                  Traceback (most recent call last):
  File "/home/testing/.local/bin/decompose", line 42, in <module>
    decomposer.decompose()
  File "/home/testing/.local/lib/python2.7/site-packages/Oligotyping/lib/decomposer.py", line 304, in decompose
    self._init_topology()
  File "/home/testing/.local/lib/python2.7/site-packages/Oligotyping/lib/decomposer.py", line 220, in _init_topology
    self.root = self.topology.add_new_node('root', reads, root = True)
  File "/home/testing/.local/lib/python2.7/site-packages/Oligotyping/lib/topology.py", line 71, in add_new_node
    node.refresh()
  File "/home/testing/.local/lib/python2.7/site-packages/Oligotyping/lib/topology.py", line 375, in refresh
    self.do_entropy()
  File "/home/testing/.local/lib/python2.7/site-packages/Oligotyping/lib/topology.py", line 334, in do_entropy
    column = ''.join([read.seq[position] * read.frequency for read in self.reads])
IndexError: string index out of range

Any ideas on how to fix that?

Thanks, ~josh

meren commented 6 years ago

You could use o-pad-with-gaps if the variation is meaningful across your sequences.