Error running decode in FilterUncorrectabledPEfastq.py

000generic commented 10 months ago

Hi!

I'm working on a pipeline for transcriptome assembly and would like to include some of your TranscriptomeAssemblyTools in processing reads prior to Trinity. After running Rcorrector I believe the correct Utility tools would be

FilterUncorrectabledPEfastq.py

but I'm getting an error around the use of decode in the python script. Please see below for details. I'm wondering if I need to process the Rcorrector fq output somehow prior to running FilterUncorrectabledPEfastq.py - or something like this.

Any ideas or guidance would be greatly appreciated!

Thank you :) Eric

(base) GO Eric :) python --version
Python 3.10.12

(base) GO Eric :) python 000-harvard_informatics-FilterUncorrectabledPEfastq.py -1 output/10-rcorrector/SRR9606759_1.cor.fq -2 output/10-rcorrector/SRR9606759_2.cor.fq -s SRR9606759

Traceback (most recent call last):
  File "/work/eric-edsinger/code/builds/build-assembleRNA/000-harvard_informatics-FilterUncorrectabledPEfastq.py", line 87, in <module>
    head1,seq1,placeholder1,qual1=[i.decode('ASCII').strip() for i in entry]
  File "/work/eric-edsinger/code/builds/build-assembleRNA/000-harvard_informatics-FilterUncorrectabledPEfastq.py", line 87, in <listcomp>
    head1,seq1,placeholder1,qual1=[i.decode('ASCII').strip() for i in entry]
AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?

(base) GO Eric :) head -n 4 output/10-rcorrector/SRR9606759_1.cor.fq
@SRR9606759.1 1 length=150 l:27 m:37 h:53 cor
CCTTAAATGACTATCTTCATGATCTTCTTCGTCCAATGAACGCAACGAATATTTTCTTTGAGAAAACGTCCTCTTATGCTGTCTTTGAGCCTTTGGAATTAAAGAAGATATTCTCCGGGGATTAAATGCTTCAGATTCCATACTTAAATC
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEAEEAEEEEEEEAEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEE

(base) GO Eric :) head -n 4 output/10-rcorrector/SRR9606759_2.cor.fq
@SRR9606759.1 1 length=150 l:35 m:51 h:74
ATTTTGTACTACCGAATATTGCGGAATTCTGGATATCAATGCTCTTTCAAATCGCTCTGCAAGTATTGATTTAAGTATGGAATCTGAAGCATTTAATCCCCGGAGAATATCTTCTTTAATTCCAAAGGCTCAAAGACAGCATAAGAGGAC
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEAEEEEEEEEEEAEEEEEEEEEEEEAE/AAAEEEEEEAEE

Lilneo786 commented 10 months ago

option 1: Confirm that you are running the script with Python 2.x and not Python 3.x option 2: Update the Script: If available, check for an updated version of the FilterUncorrectabledPEfastq.py script that is compatible with Python 3.

option 3: Modify the Script (Temporary Solution): If you cannot find a Python 3-compatible version of the script, you can try modifying the script to work with Python 3 by removing the decode calls. Here's how you can modify the relevant line (line 87 in your error message): head1, seq1, placeholder1, qual1 = [i.strip() for i in entry]

This change removes the unnecessary decode call and should work in Python 3.

000generic commented 10 months ago

Thank you Lilneo786!

Switching to Python2.7 did the trick - I'll look around for a Python3 version / would be interesting if I can update the script to be Python3 compatible.

Thanks again!

adamfreedman commented 10 months ago

I had recently refactored the scripts to work in python3 so I will need to look into what is going on

Lilneo786 commented 10 months ago

I had recently refactored the scripts to work in python3 so I will need to look into what is going on

I'm sure you don't need any assistance but if you do let me know :)

harvardinformatics / TranscriptomeAssemblyTools

Error running decode in FilterUncorrectabledPEfastq.py #10