ERROR - 'Seq' object has no attribute 'alphabet'

chregu1971 commented 4 years ago

Hej, I am trying to run pacasus on some PacBio HiFi reads, but I can't get it to work. I get: ERROR - 'Seq' object has no attribute 'alphabet'. See the whole output below. Regards, Christian

$ python ../../pacasus/pacasus.py -o ps_307_allcells_cleaned.fasta --loglevel=DEBUG --minimum_read_length=500 --device_type=CPU --platform_name='Portable Computing Language' --framework=opencl ../rawreads/ps_307_allcells.fasta INFO - Initializing application... DEBUG - Initializing Score... DEBUG - Initializing score finished. DEBUG - Initializing DnaRnaScore... DEBUG - Creating matrix with parameters: match_score: 3, mismatch_score: -4, gap_score: -3.0, other_score: -1, any_score: 0 DEBUG - Initializing DnaRnaScore finished. INFO - Application initialized. INFO - Setting program... DEBUG - Initializing aligner... DEBUG - Initializing hitlist... DEBUG - Initializing hitlist OK. DEBUG - Setting SW... DEBUG - Using OpenCL CPU implementation DEBUG - Initializing SmithWaterman. INFO - No gap extension penalty detected: using original PaSWAS scoring algorithm DEBUG - Found platform <pyopencl.Platform 'Portable Computing Language' at 0x7f70625b1008> DEBUG - Initializing device 0 DEBUG - Aligner initialized. WARNING - Forcing output to FASTA WARNING - Forcing query step to 1 WARNING - Forcing sequence step to 1 WARNING - Forcing Matrix to PALINDROME DEBUG - Initializing Score... DEBUG - Initializing score finished. DEBUG - Initializing DnaRnaScore... DEBUG - Creating matrix with parameters: match_score: 3, mismatch_score: -4, gap_score: -3.0, other_score: -1, any_score: 0 DEBUG - Initializing DnaRnaScore finished. INFO - Program set. DEBUG - Initializing hitlist... DEBUG - Initializing hitlist OK. INFO - Reading query sequences 0 1... DEBUG - Initializing reader path = /rawdata/christian/ps_307/rawreads/ps_307_allcells.fasta limitlength = 100000... DEBUG - Initializing reader finished. DEBUG - Reading from fasta file... ERROR - 'Seq' object has no attribute 'alphabet' Traceback (most recent call last): File "../../pacasus/pacasus.py", line 13, in ppw.run() File "/rawdata/christian/pacasus/pacasus/pacasusall.py", line 127, in run query_sequences = self._get_query_sequences(self.arguments[0], start=query_start, end=query_end) File "/rawdata/christian/pacasus/pypaswas/pyPaSWAS/pypaswasall.py", line 104, in _get_query_sequences reader.read_records(start,end) File "/rawdata/christian/pacasus/pypaswas/pyPaSWAS/Core/Readers.py", line 105, in read_records self.records = [SWSeqRecord(Seq(str(record.seq), record.seq.alphabet), File "/rawdata/christian/pacasus/pypaswas/pyPaSWAS/Core/Readers.py", line 105, in self.records = [SWSeqRecord(Seq(str(record.seq), record.seq.alphabet), AttributeError: 'Seq' object has no attribute 'alphabet'

swarris commented 4 years ago

The biopython module seems to have difficulty reading your fasta file. Could you give a couple of reads so I can have a look?

chregu1971 commented 4 years ago

Here are a few reads. Thank you, Christian

CTR_pacasus_test.txt

swarris commented 4 years ago

These reads are processed by Pacasus without problems. The issue seems to be with the fasta file in relation with BioPython. Could you run this script on your data to find out which read gives the problems?

python3 checkFasta.py CTR_pacasus_test.txt

from Bio import SeqIO
import sys

for s in SeqIO.parse(open(sys.argv[1],"r"), "fasta"):
    print(s.id)

chregu1971 commented 4 years ago

When I run your script, I get a list of the reads in the file. No error messages. But pacasus does not work for me even on the 10 reads file (I tried on Ubuntu 14.04 and 18.04, with python3 conda environments).

(pacasus) ionadmin@Proton06:/rawdata/christian/ps_307/pacasus_20201021$ more checkFasta.py from Bio import SeqIO import sys

for s in SeqIO.parse(open(sys.argv[1],"r"), "fasta"): print(s.id)

(pacasus) ionadmin@Proton06:/rawdata/christian/ps_307/pacasus_20201021$ python checkFasta.py CTR_pacasus_test.fasta m54259_200909_032022/4194375/ccs m54259_200909_032022/4194377/ccs m54259_200909_032022/4194381/ccs m54259_200909_032022/4194384/ccs m54259_200909_032022/4194387/ccs m54259_200909_032022/4194388/ccs m54259_200909_032022/4194391/ccs m54259_200909_032022/4194399/ccs m54259_200909_032022/4194400/ccs m54259_200909_032022/4194401/ccs (pacasus) ionadmin@Proton06:/rawdata/christian/ps_307/pacasus_20201021$

swarris commented 4 years ago

Very strange... I ran this: python3 ~/git/Pacasus/pacasus.py ~/Downloads/CTR_pacasus_test.txt -L /tmp/log.txt System: Ubuntu 20.04 LTS, no conda env. And the log file shows me this:

2020-10-30 10:27:10,625 - DEBUG - Initialized FASTA formatter
2020-10-30 10:27:10,625 - INFO - Formatting OK.
2020-10-30 10:27:10,625 - INFO - Writing output...
2020-10-30 10:27:10,625 - INFO - formatting results...
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194375/ccs_b1
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194375/ccs_b2_b2
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194377/ccs
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194381/ccs_b1_b1
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194381/ccs_b1_b2
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194381/ccs_b2
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194384/ccs_b1
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194384/ccs_b2
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194387/ccs_b1
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194387/ccs_b2
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194388/ccs_b2_b1
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194388/ccs_b2_b2
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194391/ccs_b2_b2_b1
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194391/ccs_b2_b2_b2
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194399/ccs_b1
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194399/ccs_b2
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194400/ccs_b2_b2_b1
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194400/ccs_b2_b2_b2
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194401/ccs_b1_b1
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194401/ccs_b1_b2
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194401/ccs_b2_b2_b2_b1
2020-10-30 10:27:10,625 - DEBUG - Formatting hit m54259_200909_032022/4194401/ccs_b2_b2_b2_b2
2020-10-30 10:27:10,625 - DEBUG - printing results...
2020-10-30 10:27:10,625 - DEBUG - finished printing results
2020-10-30 10:27:10,625 - INFO - Writing OK.
2020-10-30 10:27:10,625 - INFO - Finished

I'll have a closer look. Maybe the text conversion removed some special (UTF8) characters. Could you send me the file by e-mail? s.warris_at_gmail.com

itaraju commented 3 years ago

I'm facing the same issue. I've also run the test script, getting the list of sequence IDs with no error. Also checked that all all IDs present in the fasta file appear in the output. Was this issue already solved in some other way?

tbrown91 commented 3 years ago

@itaraju In case it's useful. I just ran into the same issue. It looks like biopython got a bit of an overhaul in v1.78

https://biopython.org/wiki/Alphabet

I was able to get around this by downgrading to v1.77

swarris commented 3 years ago

Great, thanks for the information. I will fix this.

swarris / Pacasus

ERROR - 'Seq' object has no attribute 'alphabet' #19