Error: IndexError: list index out of range

complexgenome commented 7 years ago

Hi BCLA team,

I'm using this tool on 16S rRNA read for V3-V4 region. Below are steps I followed and error from tool:

Step 1: python 1.subset_db_acc.py

Step 2: python 2.blca_main.py -i otus.fna -c 0.95 -b 0.95 After BLAST process, I get error:

blastdbcmd is located in your PATH! muscle is located in your PATH! blastn is located in your PATH! >> Running blast!! >> Blastn Finished!! > 1 > Read in blast output! > 3 > Read in taxonomy information! Traceback (most recent call last): File "2.blca_main.py", line 294, in <module> seq=fsaln[k+1].rstrip() IndexError: list index out of range

Tool versions: Python 2.7.10 (default, Oct 23 2015, 19:19:21) [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin Nucleotide-Nucleotide BLAST 2.2.31+ MUSCLE v3.8.31

Let me know if any other details are needed.

yingeddi2008 commented 7 years ago

Please update your blastn version to 2.5.0. And re-try.

Huaiying (Eddi) Lin

On May 22, 2017, at 10:30, Sanjeev notifications@github.com wrote:

Step 1: python 1.subset_db_acc.py

Step 2: python 2.blca_main.py -i otus.fna -c 0.95 -b 0.95 After BLAST process, I get error:

blastdbcmd is located in your PATH! muscle is located in your PATH! blastn is located in your PATH!

Running blast!! Blastn Finished!! 1 > Read in blast output! 3 > Read in taxonomy information! Traceback (most recent call last): File "2.blca_main.py", line 294, in seq=fsaln[k+1].rstrip() IndexError: list index out of range

Tool versions: Python 2.7.10 (default, Oct 23 2015, 19:19:21) [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin Nucleotide-Nucleotide BLAST 2.2.31+ MUSCLE v3.8.31

Let me know if any other details are needed.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

complexgenome commented 7 years ago

Thank you. I updated blastn to BLAST 2.6.0+ from link ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ . v2.5+ isn't available longer. After update, I still get same error. :(

yingeddi2008 commented 7 years ago

Hi Sariya,

That is a read ID length limitation from muscle. The length of read ID should be limited to no more than 28 characters so that the sequence ID is matched between blastn and muscle output. Such as, in your fasta file, if you have a read like the following:

>IUMNG_UYEBDV_092764_YSTRSMFG_001
AGCTAGCTAGCTAGCCCGAGCCAAATTCAGCAG...

Please convert your read ID to a shorter version of unique IDs, such as:

>seq0003
AGCTAGCTAGCTAGCCCGAGCCAAATTCAGCAG...

We are thinking about implementing a new module in the next release to improve this issue. Thanks for using our software, and providing feedback.

Eddi

complexgenome commented 7 years ago

Hi Eddi, I'm using sequences post chimera checking and clustering (VSEARCH tool). Sequence file has sequences with OTU_XXX, with average length of ID as 8.5, median length 9, minimum length as 5 and maximum seq id length as 9.
I don't think I'd be able to shorten than current ones. I've ~3900 sequences.

yingeddi2008 commented 7 years ago

Hi Sanjeev,

If the ID length is not the problem, we need to dig deeper into the process. Let's first make sure we have eliminated the obvious ones.

I have updated the 2.BLCA_main.py script. Please download it, and run it again to see if you can reproduce the error message. Also please make sure you are using blastn 2.5.0+, since that's the blastn version we have tested it on. You can download it from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.5.0/.

If the error message persists, could you please provide a small subset of the reads for me? Please make sure you can reproduce the same error message, and email me the example subset dataset to ying.eddi2008@gmail.com, my personal email, since it can accept larger data file.

Eddi

On Tue, May 23, 2017 at 11:41 AM, Sanjeev notifications@github.com wrote:

Hi Eddi, I'm using sequences post chimera checking and clustering (VSEARCH tool). Sequence file has sequences with OTU_XXX, with average length of ID as 8.5, median length 9, minimum length as 5 and maximum seq id length as 9. I don't think I'd be able to shorten than current ones. I've ~3900 sequences.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qunfengdong/BLCA/issues/1#issuecomment-303461436, or mute the thread https://github.com/notifications/unsubscribe-auth/AHCP0w9e2CtyRVsUMy9QStweRUM9pi0cks5r8wxGgaJpZM4NihUW .

qunfengdong / BLCA

Error: IndexError: list index out of range #1