mortazavilab / TranscriptClean

Correct mismatches, microindels, and noncanonical splice junctions in long reads that have been mapped to the genome
MIT License
67 stars 17 forks source link

AttributeError: 'Fasta' object has no attribute 'sequence' #36

Closed catsargent closed 1 year ago

catsargent commented 1 year ago

Whilst running the accessory script get_SJs_from_gtf.py I got the following error:

Traceback (most recent call last):
  File "/projects/b1177/software/TranscriptClean/accessory_scripts/get_SJs_from_gtf.py", line 114, in <module>
    spliceJn = formatSJOutput(info, prev_exonEnd, genome, minIntron)
  File "/projects/b1177/software/TranscriptClean/accessory_scripts/get_SJs_from_gtf.py", line 28, in formatSJOutput
    intronMotif = getIntronMotif(chromosome, intron_start, intron_end, genome)
  File "/projects/b1177/software/TranscriptClean/accessory_scripts/get_SJs_from_gtf.py", line 45, in getIntronMotif
    startBases = genome.sequence({'chr': chrom, 'start': start, 'stop': start + 1}, one_based=True)
AttributeError: 'Fasta' object has no attribute 'sequence'

Additionally, I also see this warning when running TranscriptClean.py without providing a file of reference splice junctions:

'Fasta' object has no attribute 'sequence'

Although the script runs through. Do you know what might be the issue?

Many thanks, Catherine

catsargent commented 1 year ago

Looking into this further, the errors are due to the move from package pyfasta to pyfaidx. Accessing the sequence of a fasta file record is done differently and this has not been updated in TranscriptClean's code.

fairliereese commented 1 year ago

Hey! Can you try again using the latest commits?

rugilemat commented 1 year ago

Hi, I am having a similar issue:

/users/k19022845/TranscriptClean-master/TranscriptClean.py:339: UserWarning: Problem parsing transcript with ID 'cc678876-49e2-448e-8d12-00dcccbd703a'
  warnings.warn("Problem parsing transcript with ID '" +
'Fasta' object has no attribute 'sequence'
/users/k19022845/TranscriptClean-master/TranscriptClean.py:448: UserWarning: Problem encountered while correcting transcript with ID 7590085e-5bb7-4332-b33e-c073287e27ac. Will output original version.

I have tried the latest commit from github but keep getting the same issue. I also tried an earlier version with pyfasta install instead of pyfaidx but then then issue is:

line 19, in <module>
    from pyfasta import Fasta
ModuleNotFoundError: No module named 'pyfasta'

Any help on this would be really great!

catsargent commented 1 year ago

Sorry for not getting back to you about this. I did use the latest commit and the issue is that it still uses pyfasta and not pyfaidx. I did not have time to update my conda environment and change the line where pyfasta is imported but I guess that would hopefully fix it.

fairliereese commented 1 year ago

🤦 My bad, will fix that now.

rugilemat commented 1 year ago

I'm not sure that's the issue as it is pyfaidx in the script in github commit but I was still getting the issue above (I used the scripts in the repository, not the latest release version).

fairliereese commented 1 year ago

Sorry for not getting back to you about this. I did use the latest commit and the issue is that it still uses pyfasta and not pyfaidx. I did not have time to update my conda environment and change the line where pyfasta is imported but I guess that would hopefully fix it.

Should be fixed now!

fairliereese commented 1 year ago

I'm not sure that's the issue as it is pyfaidx in the script in github commit but I was still getting the issue above (I used the scripts in the repository, not the latest release version).

Hi, for your issue, can you try installing the latest commits and seeing if it now runs?

rugilemat commented 1 year ago

Hi, for your issue, can you try installing the latest commits and seeing if it now runs?

I have just tried it with latest commit, and am still getting the same issue. This is what I've been running if that's any helpful:

python "/users/k19022845/TranscriptClean-master/TranscriptClean.py" --sam "BC54_aligned.sam" --genome "/scratch/users/k19022845/refgenome/GRCh38.p13.genome.fa" --threads 16 --outprefix "BC54_Transcript_Clean_aligned"

fairliereese commented 1 year ago

Can you please copy your error here just so we're on the same page?

rugilemat commented 1 year ago

transcript_clean_run.txt Yes - I'm uploading the output I got.

Here's the main idea in the gist:

/users/k19022845/TranscriptClean-master/TranscriptClean.py:339: UserWarning: Problem parsing transcript with ID 'cc678876-49e2-448e-8d12-00dcccbd703a'
  warnings.warn("Problem parsing transcript with ID '" +
'Fasta' object has no attribute 'sequence'
/users/k19022845/TranscriptClean-master/TranscriptClean.py:448: UserWarning: Problem encountered while correcting transcript with ID 7590085e-5bb7-4332-b33e-c073287e27ac. Will output original version.
fairliereese commented 1 year ago

Ok, sorry about this! The try / except block in the code made it more difficult to pinpoint the issue than it needed to be. But I believe I've fixed all the pyfaidx incompatibilities, and I was able to run it myself. So please try again and thanks for bearing with me!

rugilemat commented 1 year ago

Ok, sorry about this! The try / except block in the code made it more difficult to pinpoint the issue than it needed to be. But I believe I've fixed all the pyfaidx incompatibilities, and I was able to run it myself. So please try again and thanks for bearing with me!

Thank you! I have tried it with one sample and it seems to be working fine. I'm running a full batch now.