kundajelab / bpnet

Toolkit to train base-resolution deep neural networks on functional genomics data and to interpret them
http://bit.ly/bpnet-colab
MIT License
141 stars 33 forks source link

running bpnet cwm-scan modisco #43

Closed kanglizhu closed 2 years ago

kanglizhu commented 2 years ago

Dear developers,

I want to discover motifs with TF-MoDISco. When i run "bpnet modisco-run contrib.scores.h5 --premade=modisco override='TfModiscoWorkflow.max_seqlets_per_metacluster=20000' modisco/", i got an warning message

H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details. grp = h5py.File(output_path)"

and an error message

[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  30 tasks      | elapsed:  1.7min
[Parallel(n_jobs=10)]: Done  50 out of  50 | elapsed:  2.5min finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  40 tasks      | elapsed:    0.9s
[Parallel(n_jobs=10)]: Done 340 tasks      | elapsed:    6.2s
[Parallel(n_jobs=10)]: Done 840 tasks      | elapsed:   14.9s
[Parallel(n_jobs=10)]: Done 892 out of 911 | elapsed:   15.9s remaining:    0.3s
[Parallel(n_jobs=10)]: Done 911 out of 911 | elapsed:   16.1s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  30 tasks      | elapsed:   30.9s
[Parallel(n_jobs=10)]: Done  50 out of  50 | elapsed:   45.7s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  40 tasks      | elapsed:    0.3s
[Parallel(n_jobs=10)]: Done 307 out of 307 | elapsed:    1.9s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  30 tasks      | elapsed:   14.6s
[Parallel(n_jobs=10)]: Done  50 out of  50 | elapsed:   22.1s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  40 tasks      | elapsed:    0.2s
[Parallel(n_jobs=10)]: Done 114 out of 114 | elapsed:    0.4s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  30 tasks      | elapsed:   11.8s
[Parallel(n_jobs=10)]: Done  50 out of  50 | elapsed:   18.0s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  40 tasks      | elapsed:    0.2s
[Parallel(n_jobs=10)]: Done  92 out of 111 | elapsed:    0.3s remaining:    0.1s
[Parallel(n_jobs=10)]: Done 111 out of 111 | elapsed:    0.4s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  30 tasks      | elapsed:   11.8s
[Parallel(n_jobs=10)]: Done  50 out of  50 | elapsed:   18.0s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  40 tasks      | elapsed:    0.1s
[Parallel(n_jobs=10)]: Done  54 out of  73 | elapsed:    0.2s remaining:    0.1s
[Parallel(n_jobs=10)]: Done  73 out of  73 | elapsed:    0.2s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  30 tasks      | elapsed:   11.6s
[Parallel(n_jobs=10)]: Done  50 out of  50 | elapsed:   17.5s finished
/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/cli/modisco.py:112: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
  grp = h5py.File(output_path)

Executing:   0%|          | 0/13 [00:00<?, ?cell/s]
Executing:   8%|▊         | 1/13 [00:08<01:40,  8.40s/cell]
Executing:  31%|███       | 4/13 [00:38<01:28,  9.82s/cell]
Executing:  54%|█████▍    | 7/13 [00:40<00:29,  4.88s/cell]
Executing:  69%|██████▉   | 9/13 [00:44<00:15,  3.82s/cell]
Executing:  69%|██████▉   | 9/13 [00:45<00:20,  5.00s/cell]
Traceback (most recent call last):
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/bin/bpnet", line 8, in <module>
    sys.exit(main())
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/__main__.py", line 38, in main
    argh.dispatch(parser)
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 174, in dispatch
    for line in lines:
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 277, in _execute_command
    for line in result:
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 260, in _call
    result = function(*positional, **keywords)
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/cli/modisco.py", line 342, in bpnet_modisco_run
    null_per_pos_scores=null_per_pos_scores)
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/cli/modisco.py", line 129, in modisco_run
    modisco_dir=os.path.dirname(output_path)))
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/utils.py", line 51, in render_ipynb
    parameters=params
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/papermill/execute.py", line 122, in execute_notebook
    raise_for_execution_errors(nb, output_path)
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/papermill/execute.py", line 234, in raise_for_execution_errors
    raise error
papermill.exceptions.PapermillExecutionError: 
---------------------------------------------------------------------------
Exception encountered at "In [6]":
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-fd816d05060b> in <module>
----> 1 mf.plot_all_patterns(trim_frac=0, letter_width=0.14, height=0.5, ylim=[0, 2], no_axis=True)

~/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/modisco/files.py in plot_all_patterns(self, kind, trim_frac, n_min_seqlets, ylim, no_axis, **kwargs)
    525                               kind=kind,
    526                               trim_frac=trim_frac,
--> 527                               **kwargs)
    528             if ylim is not None:
    529                 plt.ylim(ylim)

~/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/modisco/files.py in plot_pattern(self, pattern_name, kind, rc, trim_frac, letter_width, height, rotate_y, ylab)
    494                      rotate_y=0,
    495                      ylab=True):
--> 496         pattern = self.get_pattern(pattern_name)
    497         pattern = pattern.trim_seq_ic(trim_frac)
    498         ns = self.n_seqlets(pattern_name)

~/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/modisco/files.py in get_pattern(self, pattern_name)
    126         from bpnet.modisco.core import Pattern
    127         # TODO - add number of seqlets?
--> 128         return Pattern.from_hdf5_grp(self._get_pattern_grp(*pattern_name.split("/")), pattern_name)
    129 
    130     def metaclusters(self):

~/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/modisco/core.py in from_hdf5_grp(cls, grp, name)
    170                 return Pattern(name,
    171                                seq=grp['sequence']['fwd'][:],
--> 172                                contrib={t: grp[t][grp_1][contrib_name]['fwd'][:] for t in tasks},
    173                                hyp_contrib={t: grp[t][grp_1][hyp_contrib_name]['fwd'][:] for t in tasks})
    174 

~/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/modisco/core.py in <dictcomp>(.0)
    170                 return Pattern(name,
    171                                seq=grp['sequence']['fwd'][:],
--> 172                                contrib={t: grp[t][grp_1][contrib_name]['fwd'][:] for t in tasks},
    173                                hyp_contrib={t: grp[t][grp_1][hyp_contrib_name]['fwd'][:] for t in tasks})
    174 

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

~/.conda/envs/bpnet/lib/python3.6/site-packages/h5py/_hl/group.py in __getitem__(self, name)
    262                 raise ValueError("Invalid HDF5 object reference")
    263         else:
--> 264             oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
    265 
    266         otype = h5i.get_type(oid)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5o.pyx in h5py.h5o.open()

KeyError: "Unable to open object (object 'profile' doesn't exist)"

  In call to configurable 'modisco_run' (<function modisco_run at 0x149578342f28>)

Any help or guidance would be greatly appreciated!

Thank you!

kangli zhu

kanglizhu commented 2 years ago

Hi, I think this may be because they cannot find the input file. I‘m trying to change the "modisco.py" and retry. Here is the code that i changed, but i'm not sure that's the right thing to do. ` """ import h5py modisco_results = workflow(task_names=task_names, contrib_scores=contrib_scores, hypothetical_contribs=hypothetical_contribs, one_hot=one_hot, null_per_pos_scores=null_per_pos_scores)

save the results

logger.info(f"Saving modisco file to {output_path}")
grp = h5py.File(output_path,'r+')
modisco_results.save_hdf5(grp)
grp.flush()
grp.close()

`

kanglizhu commented 2 years ago

Hi, I think this may be because they cannot find the input file. I‘m trying to change the "modisco.py" and retry. Here is the code that i changed, but i'm not sure that's the right thing to do. ` """ import h5py modisco_results = workflow(task_names=task_names, contrib_scores=contrib_scores, hypothetical_contribs=hypothetical_contribs, one_hot=one_hot, null_per_pos_scores=null_per_pos_scores) # save the results logger.info(f"Saving modisco file to {output_path}") grp = h5py.File(output_path,'r+') modisco_results.save_hdf5(grp) grp.flush() grp.close()

`

It doesn't work.