qunfengdong / BLCA

34 stars 12 forks source link

max() arg is an empty sequence error when using Greengenes reference #10

Closed aboffin closed 5 years ago

aboffin commented 6 years ago

Hi, When I run BLCA as follows:

python 2.blca_main.py -i ungap_Ano.fasta -r gg/gg_13_5_taxonomy.taxonomy \
  -o gg_ungap_Ano.blca.out

I get the following error:

blastdbcmd is located in your PATH!
muscle is located in your PATH!
>  > Fasta file read in!!
>  > Reading in taxonomy information! ....
blastn is located in your PATH!
> > Running blast!!
> > Blastn Finished!!
>  > Read in blast output!
Traceback (most recent call last):
  File "2.blca_main.py", line 355, in <module>
    outout.write(le+":"+max(lexsum,key=lexsum.get)+";"+str(max(lexsum.values()))+";")
ValueError: max() arg is an empty sequence

However, when I use the NCBI 16S rRNA database, there is no error and the output file is generated. Any pointers are appreciated.

Thanks in advance,

yingeddi2008 commented 6 years ago

Hi Senthil,

Thanks for using our software!

I just did a test run of greengene database using the test.fasta, and it successfully finished the job. No error message popped up.

Also I noticed in your code that you didn't provide the -q --db argument, that might be where the issue is coming from. Because the default database is NCBI's 16s, while the taxonomy file is greengene, so the program won't be able to find the matching taxonomy information. Can you try adding the -q --db argument and run the program again?

If the error message persists, can you provide me with the input file? I will try to produce the same error message.

Thanks,

Eddi

On Wed, Sep 5, 2018 at 4:50 PM Senthil Murugapiran notifications@github.com wrote:

Hi, When I run BLCA as follows:

python 2.blca_main.py -i ungap_Ano.fasta -r gg/gg_13_5_taxonomy.taxonomy \ -o gg_ungap_Ano.blca.out

I get the following error:

blastdbcmd is located in your PATH! muscle is located in your PATH!

Fasta file read in!! Reading in taxonomy information! .... blastn is located in your PATH! Running blast!! Blastn Finished!! Read in blast output! Traceback (most recent call last): File "2.blca_main.py", line 355, in outout.write(le+":"+max(lexsum,key=lexsum.get)+";"+str(max(lexsum.values()))+";") ValueError: max() arg is an empty sequence

However, when I use the NCBI 16S rRNA database, there is no error and the output file is generated. Any pointers are appreciated.

Thanks in advance,

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/qunfengdong/BLCA/issues/10, or mute the thread https://github.com/notifications/unsubscribe-auth/AHCP065Z_ih2ycuw9se0f9ocSx9LltJYks5uYEcQgaJpZM4WbwSg .

aboffin commented 5 years ago

Hi Eddi,

Explicitly specifying the "-q" option resolved the issue!

Thank you for your help and sorry for missing that in the README file.

Here is the command-line that worked:

python 2.blca_main.py -i out -r gg/gg_13_5_taxonomy.taxonomy   -o blca.out -q gg/gg_13_5     
blastdbcmd is located in your PATH!
muscle is located in your PATH!
>  > Fasta file read in!!
>  > Reading in taxonomy information! ....
blastn is located in your PATH!
> > Running blast!!
> > Blastn Finished!!
>  > Read in blast output!
>> Taxonomy file generated!!