jmschrei / tfmodisco-lite

A lite implementation of tfmodisco, a motif discovery algorithm for genomics experiments.
MIT License
52 stars 15 forks source link

Errors from MEME text parser leads to pandas.errors.EmptyDataError: No columns to parse from file #37

Open anupamajha1 opened 1 year ago

anupamajha1 commented 1 year ago

I am trying to compare JASPER motifs to motifs found by TF-modisco-lite using the following command:

modisco report -i heart_left_ventricle_random_2000_all_modisco_results.h5 -o report_all_modisco/ -s report_all_modisco/ -m JASPAR2022_CORE_vertebrates_non-redundant_pfms_meme.txt

Relevant files: report_all_modisco.tar.gz JASPAR2022_CORE_vertebrates_non-redundant_pfms_meme.txt heart_left_ventricle_random_2000_all_modisco_results.h5.gz

But I ran into the following error. I am attaching the files needed to reproduce this issue.

Errors from MEME text parser:
The PSPM of motif 1 has probabilities which don't sum to 1 on row 1.
The PSPM of motif 1 has probabilities which don't sum to 1 on row 1.
FATAL: Requested motif number 1  was not found in file '/tmp/296678579.1.noble-login.q/tmpuk0jiknv'.

Traceback (most recent call last):
  File "/net/noble/vol1/home/anupamaj/miniconda3/bin/modisco", line 147, in <module>
    modiscolite.report.report_motifs(args.h5py, args.output, img_path_suffix=args.suffix, meme_motif_db=args.meme_db, 
  File "/net/noble/vol1/home/anupamaj/miniconda3/lib/python3.8/site-packages/modiscolite/report.py", line 263, in report_motifs
    tomtom_df = generate_tomtom_dataframe(modisco_h5py, meme_motif_db, 
  File "/net/noble/vol1/home/anupamaj/miniconda3/lib/python3.8/site-packages/modiscolite/report.py", line 138, in generate_tomtom_dataframe
    r = fetch_tomtom_matches(ppm, cwm, motifs_db=meme_motif_db,
  File "/net/noble/vol1/home/anupamaj/miniconda3/lib/python3.8/site-packages/modiscolite/report.py", line 109, in fetch_tomtom_matches
    tomtom_results = pandas.read_csv(tomtom_fname, sep="\t", usecols=(1, 5))
  File "/net/noble/vol1/home/anupamaj/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/net/noble/vol1/home/anupamaj/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 577, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/net/noble/vol1/home/anupamaj/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1407, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/net/noble/vol1/home/anupamaj/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1679, in _make_engine
    return mapping[engine](f, **self.options)
  File "/net/noble/vol1/home/anupamaj/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 557, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
anupamajha1 commented 1 year ago

I dug a little deeper, and the error stems from the following command in the code:

tomtom -no-ssc -oc . --verbosity 1 -text -min-overlap 5 -mi 1 -dist pearson -evalue -thresh 10.0 /tmp/296678579.1.noble-login.q/tmp7exmpd9d JASPAR2022_CORE_vertebrates_non-redundant_pfms_meme.txt

from here: https://github.com/jmschrei/tfmodisco-lite/blob/main/modiscolite/report.py#L111

I went and opened the tmp file, and that seems to be all zeros (screenshot attached)

Screen Shot 2023-08-25 at 3 43 00 PM

.