signalbash / how_are_we_stranded_here

Check strandedness of RNA-Seq fastq files
MIT License
107 stars 22 forks source link

how_are_we_stranded_here does not work with kallisto version greater than kallisto-0.44.0 #5

Open smk5g5 opened 3 years ago

smk5g5 commented 3 years ago

So using the latest version 0.46 (as well as version .45) of kallisto with how_are_we_stranded_here produces this error

stranded_test_gerald_H3MYFBBXX_1/kallisto_strand_test/pseudoalignments.bam does NOT exists. Traceback (most recent call last): File "/usr/local/bin/check_strandedness", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/how_are_we_stranded_here/check_strandedness.py", line 151, in main result = pd.read_csv(test_folder + '/' + 'strandedness_check.txt', sep="\n", header=None) File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 686, in read_csv return _read(filepath_or_buffer, kwds) File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 452, in _read parser = TextFileReader(fp_or_buf, **kwds) File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 936, in __init__ self._make_engine(self.engine) File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1168, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1998, in __init__ self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 519, in pandas._libs.parsers.TextReader.__cinit__ pandas.errors.EmptyDataError: No columns to parse from file

this is because it does not produce a single pseudoalignments.bam file that RSeQC's infer_experiment.py is expecting rather it produces multiple tmp.bam files like this

ls stranded_test_gerald_H3MYFBBXX_1/kallisto_strand_test/ abundance.h5 tmp.10.bam tmp.16.bam tmp.21.bam tmp.27.bam tmp.4.bam abundance.tsv tmp.11.bam tmp.17.bam tmp.22.bam tmp.28.bam tmp.5.bam pseudoaln.bin tmp.12.bam tmp.18.bam tmp.23.bam tmp.29.bam tmp.6.bam run_info.json tmp.13.bam tmp.19.bam tmp.24.bam tmp.3.bam tmp.7.bam tmp.0.bam tmp.14.bam tmp.2.bam tmp.25.bam tmp.30.bam tmp.8.bam tmp.1.bam tmp.15.bam tmp.20.bam tmp.26.bam tmp.31.bam tmp.9.bam

kallisto version 0.44.0 works with the tool as expected!

I was wondering if there is a way to make this tool work with the latest version of kallisto?

signalbash commented 3 years ago

I had the same thing happen to me, and the quick fix was as you mentioned to use an older version of kallisto This seems to be the culprit: https://github.com/pachterlab/kallisto/issues/105

But I can take a more in depth look as to if we can do anything to fix it on our end.

KuechlerO commented 1 year ago

I encountered the same error and just want to share my conda virtual environment file, which made the tool work for me:

name: test_strandedness_env
channels:
  - bioconda
  - conda-forge
  - default
dependencies:
  - python==3.7
  - kallisto=0.44.0
  - pip
  - pip:
    - how_are_we_stranded_here==1.0.1

Additional note: I ran this on a cluster via the snakemake framework. Sometimes calling check_strandedness resulted in somehow unspecified errors. However, simply re-running the same rules then solved the issue for me.