qunfengdong / BLCA

34 stars 12 forks source link

Error with alignment #37

Open MarineBio-LKRod opened 1 year ago

MarineBio-LKRod commented 1 year ago

Hello!

I am getting the error "FileNotFoundError: [Errno 2] No such file or directory: 'ESV_007833.aln'" after running BLCA with this code (including a custom database): python 2.blca_main.py -i 2022.fasta -r TAXONOMY.txt -q CBBI_12S_renamed.fas.

I have tried (fix #26) incorporating "-p 1" into the script, but that leads me then to the common "ValueError: max() arg is an empty sequence" issue. Following this, I checked all of my files for potential incorrect formatting (tabs/speces, proper placement of : and ;, etc) and cannot determine any problem within the source files - even though the test.fasta file runs properly within the same script. I've reinstalled Clustalo as well as I thought that may be the issue since I got this error at one point: "FATAL: Cannot change number of threads to 2. Clustal Omega was build without OpenMP support."

I'm not sure what else I can do at this point. The problem at its core seemingly stems from the creation of .aln files but I haven't seen anyone else post about this particular issue.

Thank you!

qunfengdong commented 1 year ago

Hmm, could you please test whether your installation of clustalo is successful? For example, use the following example sequences (you can also get those sequences from https://www.ebi.ac.uk/Tools/msa/clustalo/, click Use a [example sequence] in that URL):

sp|P69905|HBA_HUMAN Hemoglobin subunit alpha OS=Homo sapiens GN=HBA1 PE=1 SV=2 MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP AVHASLDKFLASVSTVLTSKYR
sp|P01942|HBA_MOUSE Hemoglobin subunit alpha OS=Mus musculus GN=Hba PE=1 SV=2 MVLSGEDKSNIKAAWGKIGGHGAEYGAEALERMFASFPTTKTYFPHFDVSHGSAQVKGHG KKVADALASAAGHLDDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLASHHPADFTP AVHASLDKFLASVSTVLTSKYR sp|P13786|HBAZ_CAPHI Hemoglobin subunit zeta OS=Capra hircus GN=HBZ1 PE=3 SV=2 MSLTRTERTIILSLWSKISTQADVIGTETLERLFSCYPQAKTYFPHFDLHSGSAQLRAHG SKVVAAVGDAVKSIDNVTSALSKLSELHAYVLRVDPVNFKFLSHCLLVTLASHFPADFTA DAHAAWDKFLSIVSGVLTEKYR

qunfengdong commented 1 year ago

Put the following test sequences in a file, and use it as input to test your clustalo program to see if it can successfully produce a multiple-sequence-alignment.

sp|P69905|HBA_HUMAN Hemoglobin subunit alpha OS=Homo sapiens GN=HBA1 PE=1 SV=2 MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP AVHASLDKFLASVSTVLTSKYR
sp|P01942|HBA_MOUSE Hemoglobin subunit alpha OS=Mus musculus GN=Hba PE=1 SV=2 MVLSGEDKSNIKAAWGKIGGHGAEYGAEALERMFASFPTTKTYFPHFDVSHGSAQVKGHG KKVADALASAAGHLDDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLASHHPADFTP AVHASLDKFLASVSTVLTSKYR sp|P13786|HBAZ_CAPHI Hemoglobin subunit zeta OS=Capra hircus GN=HBZ1 PE=3 SV=2 MSLTRTERTIILSLWSKISTQADVIGTETLERLFSCYPQAKTYFPHFDLHSGSAQLRAHG SKVVAAVGDAVKSIDNVTSALSKLSELHAYVLRVDPVNFKFLSHCLLVTLASHFPADFTA DAHAAWDKFLSIVSGVLTEKYR

qunfengdong commented 1 year ago

make sure that ">" is used for each headline (that is, make sure that those sequences are in FASTA format)

MarineBio-LKRod commented 1 year ago

Hello again, thanks for the quick reply.

I ran the sequences that you've provided above (not with my custom database but with the default): python 2.blca_main.py -i testtest.tex And then received this error: Warning: [blastn] Query_2 sp|P01942|HBA_MOU.. : Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options I can see that in this case, the .blastn file was created but it's empty. The .blca.out file has each sequence listed as "Unclassified". This page indicates that the "ungapped Karlin-Altschul parameters" is not a fatal issue. Is the default database not appropriate for these sequences?

To test out Clustalo further, I went back to the source code example. After running the original test.fasta file with the default database, I get the error I received earlier:

FATAL: Cannot change number of threads to 2. Clustal Omega was build without OpenMP support.
Traceback (most recent call last):
  File "2.blca_main.py", line 350, in <module>
    alndic = get_dic_from_aln(k1 + ".aln")
  File "2.blca_main.py", line 82, in get_dic_from_aln
    alignment = AlignIO.read(aln, "clustal")
  File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/AlignIO/__init__.py", line 383, in read
    alignment = next(iterator)
  File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/AlignIO/__init__.py", line 322, in parse
    with as_handle(handle) as fp:
  File "/Users/laurenrodriguez/miniconda3/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/File.py", line 72, in as_handle
    with open(handleish, mode, **kwargs) as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'seq1.aln'

However, everything then worked perfectly when running "python 2.blca_main.py -i test.fasta -p 1"! Upon seeing this, I retried my code with my fasta, taxonomy, and database file with "-p 1" at the end of the line. Again, I got:

>  > Start aligning reads...
FATAL: Cannot change number of threads to 2. Clustal Omega was build without OpenMP support.
Traceback (most recent call last):
  File "2.blca_main.py", line 350, in <module>
    alndic = get_dic_from_aln(k1 + ".aln")
  File "2.blca_main.py", line 82, in get_dic_from_aln
    alignment = AlignIO.read(aln, "clustal")
  File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/AlignIO/__init__.py", line 383, in read
    alignment = next(iterator)
  File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/AlignIO/__init__.py", line 322, in parse
    with as_handle(handle) as fp:
  File "/Users/laurenrodriguez/miniconda3/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/File.py", line 72, in as_handle
    with open(handleish, mode, **kwargs) as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'ESV_000419.aln'

The fact that the program is able to run with the default parameters (plus -p 1) but not with my own leads me to think that there is a problem in my files but I have checked things over many times and cannot find anything. Am I able to send you the files to double check? Or do you suspect something is truly erroneous with clustalo?

Thank you again for your help.

MarineBio-LKRod commented 1 year ago

I found a page describing how to force OpenMP to support multi-threading, in case that's the core issue. The code that they provide was successful but I am still getting an error that "Clustal Omega was build without OpenMP support"

qunfengdong commented 1 year ago

Yes, please send your file to @.***

On Fri, Aug 25, 2023 at 4:04 PM MarineBio-LKRod @.***> wrote:

Hello again, thanks for the quick reply.

I ran the sequences that you've provided above (not with my custom database but with the default): python 2.blca_main.py -i testtest.tex And then received this error: "Warning: [blastn] Query_2 sp|P01942|HBA_MOU.. : Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options "

I can see that in this case, the .blastn file was created but it's empty. The .blca.out file has each sequence listed as "Unclassified". This page https://github.com/vivekkrish/markerminer-webapp/issues/1 indicates that the "ungapped Karlin-Altschul parameters" is not a fatal issue. Is the default database not appropriate for these sequences?

To test out Clustalo further, I went back to the source code example. After running the original test.fasta file with the default database, I get the error I received earlier:

FATAL: Cannot change number of threads to 2. Clustal Omega was build without OpenMP support. Traceback (most recent call last): File "2.blca_main.py", line 350, in alndic = get_dic_from_aln(k1 + ".aln") File "2.blca_main.py", line 82, in get_dic_from_aln alignment = AlignIO.read(aln, "clustal") File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/AlignIO/init.py", line 383, in read alignment = next(iterator) File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/AlignIO/init.py", line 322, in parse with as_handle(handle) as fp: File "/Users/laurenrodriguez/miniconda3/lib/python3.8/contextlib.py", line 113, in enter return next(self.gen) File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/File.py", line 72, in as_handle with open(handleish, mode, **kwargs) as fp: FileNotFoundError: [Errno 2] No such file or directory: 'seq1.aln'

However, everything then worked perfectly when running "python 2.blca_main.py -i test.fasta -p 1"! Upon seeing this, I retried my code with my fasta, taxonomy, and database file with "-p 1" at the end of the line. Again, I got:

Start aligning reads... FATAL: Cannot change number of threads to 2. Clustal Omega was build without OpenMP support. Traceback (most recent call last): File "2.blca_main.py", line 350, in alndic = get_dic_from_aln(k1 + ".aln") File "2.blca_main.py", line 82, in get_dic_from_aln alignment = AlignIO.read(aln, "clustal") File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/AlignIO/init.py", line 383, in read alignment = next(iterator) File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/AlignIO/init.py", line 322, in parse with as_handle(handle) as fp: File "/Users/laurenrodriguez/miniconda3/lib/python3.8/contextlib.py", line 113, in enter return next(self.gen) File "/Users/laurenrodriguez/miniconda3/lib/python3.8/site-packages/Bio/File.py", line 72, in as_handle with open(handleish, mode, **kwargs) as fp: FileNotFoundError: [Errno 2] No such file or directory: 'ESV_000419.aln'

The fact that the program is able to run with the default parameters (plus -p 1) but not with my own leads me to think that there is a problem in my files but I have checked things over many times and cannot find anything. Am I able to send you the files to double check? Or do you suspect something is truly erroneous with clustalo?

Thank you again for your help.

— Reply to this email directly, view it on GitHub https://github.com/qunfengdong/BLCA/issues/37#issuecomment-1693928447, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEOBXE7X6L73AMUG3ZQ2N33XXEHPTANCNFSM6AAAAAA34456MI . You are receiving this because you commented.Message ID: @.***>

qunfengdong commented 1 year ago

When I sent you the sequences for multiple sequence alignment, it was NOT for testing BLCA. Those test case was for testing your clustalo installation. That is, run your clustalo program with those sequences to see if you can successfully produce a multiple sequence alignment. If not, something is wrong with your clustalo installation.

qunfengdong commented 1 year ago

Using the sequences below, can you successfully produce multiple sequence alignment with the clustalo you installed?

sp|P69905|HBA_HUMAN Hemoglobin subunit alpha OS=Homo sapiens GN=HBA1 PE=1 SV=2 MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP AVHASLDKFLASVSTVLTSKYR sp|P01942|HBA_MOUSE Hemoglobin subunit alpha OS=Mus musculus GN=Hba PE=1 SV=2 MVLSGEDKSNIKAAWGKIGGHGAEYGAEALERMFASFPTTKTYFPHFDVSHGSAQVKGHG KKVADALASAAGHLDDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLASHHPADFTP AVHASLDKFLASVSTVLTSKYR sp|P13786|HBAZ_CAPHI Hemoglobin subunit zeta OS=Capra hircus GN=HBZ1 PE=3 SV=2 MSLTRTERTIILSLWSKISTQADVIGTETLERLFSCYPQAKTYFPHFDLHSGSAQLRAHG SKVVAAVGDAVKSIDNVTSALSKLSELHAYVLRVDPVNFKFLSHCLLVTLASHFPADFTA DAHAAWDKFLSIVSGVLTEKYR

MarineBio-LKRod commented 1 year ago

Oh, my bad! I used the example sequences you provided 3 days ago with a basic "clustalo -i testtest.tex -o test.fa" and got the proper output.

sp|P69905|HBA_HUMAN Hemoglobin subunit alpha OS=Homo sapiens GN=HBA1 PE=1 SV=2 MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP AVHASLDKFLASVSTVLTSKYR sp|P01942|HBA_MOUSE Hemoglobin subunit alpha OS=Mus musculus GN=Hba PE=1 SV=2 MVLSGEDKSNIKAAWGKIGGHGAEYGAEALERMFASFPTTKTYFPHFDVSHGSAQVKGHG KKVADALASAAGHLDDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLASHHPADFTP AVHASLDKFLASVSTVLTSKYR sp|P13786|HBAZ_CAPHI Hemoglobin subunit zeta OS=Capra hircus GN=HBZ1 PE=3 SV=2 MSLTRTERTIILSLWSKISTQADVIGTETLERLFSCYPQAKTYFPHFDLHSGSAQLRAHG SKVVAAVGDAVKSIDNVTSALSKLSELHAYVLRVDPVNFKFLSHCLLVTLASHFPADFTA DAHAAWDKFLSIVSGVLTEKYR

qunfengdong commented 1 year ago

Thanks for sending us your input files. One thing that I noticed is that the ID in the Taxonomy file and the BLAST database file are different. According to our instructions at https://github.com/qunfengdong/BLCA, if you are using a custom database, the ID in the BLAST database file should be the same as in the Taxonomy file. For example, there is an ID "SERCFISH1257" in your taxonomy file, there should be a record in your BLAST database file with ID "SERCFISH1257". Instead, you have an ID "Acantharchus-pomotis_SERCFISH1257". Please reformat accordingly and try again.