harvardinformatics / TranscriptomeAssemblyTools

A collection of scripts for processing fastq files in ways to improve de novo transcriptome assemblies, and for evaluating those assemblies.
47 stars 24 forks source link

Error running decode in FilterUncorrectabledPEfastq.py #10

Closed 000generic closed 10 months ago

000generic commented 10 months ago

Hi!

I'm working on a pipeline for transcriptome assembly and would like to include some of your TranscriptomeAssemblyTools in processing reads prior to Trinity. After running Rcorrector I believe the correct Utility tools would be

FilterUncorrectabledPEfastq.py

but I'm getting an error around the use of decode in the python script. Please see below for details. I'm wondering if I need to process the Rcorrector fq output somehow prior to running FilterUncorrectabledPEfastq.py - or something like this.

Any ideas or guidance would be greatly appreciated!

Thank you :) Eric

(base) GO Eric :) python --version
Python 3.10.12

(base) GO Eric :) python 000-harvard_informatics-FilterUncorrectabledPEfastq.py -1 output/10-rcorrector/SRR9606759_1.cor.fq -2 output/10-rcorrector/SRR9606759_2.cor.fq -s SRR9606759

Traceback (most recent call last):
  File "/work/eric-edsinger/code/builds/build-assembleRNA/000-harvard_informatics-FilterUncorrectabledPEfastq.py", line 87, in <module>
    head1,seq1,placeholder1,qual1=[i.decode('ASCII').strip() for i in entry]
  File "/work/eric-edsinger/code/builds/build-assembleRNA/000-harvard_informatics-FilterUncorrectabledPEfastq.py", line 87, in <listcomp>
    head1,seq1,placeholder1,qual1=[i.decode('ASCII').strip() for i in entry]
AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?

(base) GO Eric :) head -n 4 output/10-rcorrector/SRR9606759_1.cor.fq
@SRR9606759.1 1 length=150 l:27 m:37 h:53 cor
CCTTAAATGACTATCTTCATGATCTTCTTCGTCCAATGAACGCAACGAATATTTTCTTTGAGAAAACGTCCTCTTATGCTGTCTTTGAGCCTTTGGAATTAAAGAAGATATTCTCCGGGGATTAAATGCTTCAGATTCCATACTTAAATC
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEAEEAEEEEEEEAEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEE

(base) GO Eric :) head -n 4 output/10-rcorrector/SRR9606759_2.cor.fq
@SRR9606759.1 1 length=150 l:35 m:51 h:74
ATTTTGTACTACCGAATATTGCGGAATTCTGGATATCAATGCTCTTTCAAATCGCTCTGCAAGTATTGATTTAAGTATGGAATCTGAAGCATTTAATCCCCGGAGAATATCTTCTTTAATTCCAAAGGCTCAAAGACAGCATAAGAGGAC
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEAEEEEEEEEEEAEEEEEEEEEEEEAE/AAAEEEEEEAEE
Lilneo786 commented 10 months ago

option 1: Confirm that you are running the script with Python 2.x and not Python 3.x option 2: Update the Script: If available, check for an updated version of the FilterUncorrectabledPEfastq.py script that is compatible with Python 3.

option 3: Modify the Script (Temporary Solution): If you cannot find a Python 3-compatible version of the script, you can try modifying the script to work with Python 3 by removing the decode calls. Here's how you can modify the relevant line (line 87 in your error message): head1, seq1, placeholder1, qual1 = [i.strip() for i in entry]

This change removes the unnecessary decode call and should work in Python 3.

000generic commented 10 months ago

Thank you Lilneo786!

Switching to Python2.7 did the trick - I'll look around for a Python3 version / would be interesting if I can update the script to be Python3 compatible.

Thanks again!

adamfreedman commented 10 months ago

I had recently refactored the scripts to work in python3 so I will need to look into what is going on

Lilneo786 commented 10 months ago

I had recently refactored the scripts to work in python3 so I will need to look into what is going on

I'm sure you don't need any assistance but if you do let me know :)