jmschrei / tfmodisco-lite

A lite implementation of tfmodisco, a motif discovery algorithm for genomics experiments.
MIT License
56 stars 16 forks source link

Request of the example data #7

Closed Gavin-Lijy closed 1 year ago

Gavin-Lijy commented 1 year ago

Sorry, I didn't find the toy data you mentioned, could you kindly provide it to help us understand the input?

hopdebee commented 1 year ago

Yes, I'm also in dire need of this. Should the -s and -a inputs be np arrays of the shape N x L x 4 just as in the old TF-modisco ?

import numpy as np

task_to_hyp_scores = {}
task_to_scores = {}
onehot_data = {}

task_to_hyp_scoreslist = [el[0] for el in cnnResults[12]['dataset_TP']["npshap"].tolist()]
onehot_data[12] = cnnResults[12]['dataset_TP']["ohs"].tolist()

print("ohs")
display(np.shape(onehot_data[12]))
display(onehot_data[12][0])

task_to_hyp_scores[12] = [np.squeeze(scores) for scores in task_to_hyp_scoreslist]

print("shaps")
display(task_to_hyp_scores[12][0])
display(np.shape(task_to_hyp_scores[12]))

ohePath = f"{wb.resultsFolder}interpretation/nponehotdata12.npy"
np.save(ohePath, np.array(onehot_data[12]))
shapPath = f"{wb.resultsFolder}interpretation/npshapdata12.npy"
np.save(shapPath, np.array(task_to_hyp_scores[12]))
modiscoResultsPath = f"{wb.resultsFolder}interpretation/modisco_results.h5"
!modisco motifs -s {ohePath} -a {shapPath} -n 2000 -o {modiscoResultsPath}

with result

ohs
(161, 300, 4)
array([[1, 0, 0, 0],
       [0, 0, 1, 0],
       [0, 1, 0, 0],
       ...,
       [0, 0, 0, 1],
       [0, 0, 1, 0],
       [0, 0, 0, 1]])
shaps
array([[ 1.53195577e-05,  3.58301720e-05,  1.82393234e-05,
         5.96044316e-06],
       [ 1.10478895e-05,  7.33038779e-05,  3.05729173e-05,
         2.67649664e-05],
       [-1.47533502e-05, -1.00547937e-04,  1.00632858e-04,
        -5.94360919e-05],
       ...,
       [-5.54023399e-05,  1.02747357e-04, -2.64519495e-07,
        -7.14052694e-05],
       [ 6.13165478e-06, -7.53152633e-06,  6.90066693e-05,
         7.19849909e-07],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00]])
(161, 300, 4)
Traceback (most recent call last):
  File "/usr/local/bin/modisco", line 108, in <module>
    pos_patterns, neg_patterns = modiscolite.tfmodisco.TFMoDISco(
  File "/usr/local/lib/python3.8/dist-packages/modiscolite/tfmodisco.py", line 281, in TFMoDISco
    seqlet_coords, threshold = extract_seqlets.extract_seqlets(
  File "/usr/local/lib/python3.8/dist-packages/modiscolite/extract_seqlets.py", line 158, in extract_seqlets
    pos_null_values, neg_null_values = _laplacian_null(track=smoothed_tracks, 
  File "/usr/local/lib/python3.8/dist-packages/modiscolite/extract_seqlets.py", line 34, in _laplacian_null
    (np.percentile(a=pos_values, q=percentiles_to_use)-mu))
  File "<__array_function__ internals>", line 5, in percentile
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py", line 3867, in percentile
    return _quantile_unchecked(
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py", line 3986, in _quantile_unchecked
    r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py", line 3564, in _ureduce
    r = func(a, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py", line 4098, in _quantile_ureduce_func
    n = np.isnan(ap[-1])
IndexError: index -1 is out of bounds for axis 0 with size 0
jmschrei commented 1 year ago

If you're using the numpy array inputs the shape should be (batch, 4, length). Sorry for the inconsistency. I'm going to be adding more documentation soon once I finish something else.

hopdebee commented 1 year ago

Hi Jacob,

Thanks for your fast response. Got it to work with the dimensions you mentioned in your answer. Thanks for your work !