mobie / mobie-utils-python

Python tools for MoBIE
MIT License
9 stars 5 forks source link

Convert BDV h5 to n5 #9

Closed martinschorb closed 4 years ago

martinschorb commented 4 years ago

Hi, I cannot get the h5 to be read.

initialize_dataset('/g/emcf/Hamburg_XRay/20200515/20x/TAH01_parz.h5',"/t00000/s00/0/cells",ROOT,'test','X-Ray-r
    ...: aw',[0.1,0.1,0.1],DEFAULT_CHUNKS,scale_factors,is_default=0,target='local',max_jobs=16)                         
DEBUG: Checking if DownscalingWorkflow(tmp_folder=tmp_test, max_jobs=16, config_dir=tmp_test/configs, target=local, dependency=DummyTask, input_path=/g/emcf/Hamburg_XRay/20200515/20x/TAH01_parz.h5, input_key=./t00000/s00/0/cells, scale_factors=[[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], halos=[[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], metadata_format=bdv.n5, metadata_dict={"resolution": [0.1, 0.1, 0.1], "unit": "micrometer"}, output_path=./data/test/images/local/X-Ray-raw.n5, output_key_prefix=, force_copy=False, skip_existing_levels=False, scale_offset=0) is complete
/g/emcf/common/5792_Sars-Cov-2/covid-em-datasets/software/miniconda/envs/covid-em-dev/lib/python3.7/site-packages/luigi/parameter.py:279: UserWarning: Parameter "dtype" with value "None" is not of type string.
  warnings.warn('Parameter "{}" with value "{}" is not of type string.'.format(param_name, param_value))
/g/emcf/common/5792_Sars-Cov-2/covid-em-datasets/software/miniconda/envs/covid-em-dev/lib/python3.7/site-packages/luigi/parameter.py:279: UserWarning: Parameter "scale_factor" with value "(2, 2, 2)" is not of type string.
  warnings.warn('Parameter "{}" with value "{}" is not of type string.'.format(param_name, param_value))
DEBUG: Checking if WriteDownscalingMetadata(tmp_folder=tmp_test, output_path=./data/test/images/local/X-Ray-raw.n5, scale_factors=[[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], dependency=DownscalingLocal, metadata_format=bdv.n5, metadata_dict={"resolution": [0.1, 0.1, 0.1], "unit": "micrometer"}, output_key_prefix=, scale_offset=0, prefix=downscaling) is complete
INFO: Informed scheduler that task   DownscalingWorkflow_tmp_test_configs_DummyTask_False_3134488336   has status   PENDING
DEBUG: Checking if DownscalingLocal(tmp_folder=tmp_test, max_jobs=16, config_dir=tmp_test/configs, input_path=./data/test/images/local/X-Ray-raw.n5, input_key=setup0/timepoint0/s5, output_path=./data/test/images/local/X-Ray-raw.n5, output_key=setup0/timepoint0/s6, scale_factor=(2, 2, 2), scale_prefix=s6, halo=[2, 2, 2], effective_scale_factor=[64, 64, 64], dependency=DownscalingLocal) is complete
INFO: Informed scheduler that task   WriteDownscalingMetadata_DownscalingLocal___resolution_____bdv_n5_76f10fbef3   has status   PENDING
DEBUG: Checking if DownscalingLocal(tmp_folder=tmp_test, max_jobs=16, config_dir=tmp_test/configs, input_path=./data/test/images/local/X-Ray-raw.n5, input_key=setup0/timepoint0/s4, output_path=./data/test/images/local/X-Ray-raw.n5, output_key=setup0/timepoint0/s5, scale_factor=(2, 2, 2), scale_prefix=s5, halo=[2, 2, 2], effective_scale_factor=[32, 32, 32], dependency=DownscalingLocal) is complete
INFO: Informed scheduler that task   DownscalingLocal_tmp_test_configs_DownscalingLocal__64__64__64__3d9aba8d28   has status   PENDING
DEBUG: Checking if DownscalingLocal(tmp_folder=tmp_test, max_jobs=16, config_dir=tmp_test/configs, input_path=./data/test/images/local/X-Ray-raw.n5, input_key=setup0/timepoint0/s3, output_path=./data/test/images/local/X-Ray-raw.n5, output_key=setup0/timepoint0/s4, scale_factor=(2, 2, 2), scale_prefix=s4, halo=[2, 2, 2], effective_scale_factor=[16, 16, 16], dependency=DownscalingLocal) is complete
INFO: Informed scheduler that task   DownscalingLocal_tmp_test_configs_DownscalingLocal__32__32__32__d8449b3a29   has status   PENDING
DEBUG: Checking if DownscalingLocal(tmp_folder=tmp_test, max_jobs=16, config_dir=tmp_test/configs, input_path=./data/test/images/local/X-Ray-raw.n5, input_key=setup0/timepoint0/s2, output_path=./data/test/images/local/X-Ray-raw.n5, output_key=setup0/timepoint0/s3, scale_factor=(2, 2, 2), scale_prefix=s3, halo=[2, 2, 2], effective_scale_factor=[8, 8, 8], dependency=DownscalingLocal) is complete
INFO: Informed scheduler that task   DownscalingLocal_tmp_test_configs_DownscalingLocal__16__16__16__8af7d887e6   has status   PENDING
DEBUG: Checking if DownscalingLocal(tmp_folder=tmp_test, max_jobs=16, config_dir=tmp_test/configs, input_path=./data/test/images/local/X-Ray-raw.n5, input_key=setup0/timepoint0/s1, output_path=./data/test/images/local/X-Ray-raw.n5, output_key=setup0/timepoint0/s2, scale_factor=(2, 2, 2), scale_prefix=s2, halo=[2, 2, 2], effective_scale_factor=[4, 4, 4], dependency=DownscalingLocal) is complete
INFO: Informed scheduler that task   DownscalingLocal_tmp_test_configs_DownscalingLocal__8__8__8__af2167bc66   has status   PENDING
DEBUG: Checking if DownscalingLocal(tmp_folder=tmp_test, max_jobs=16, config_dir=tmp_test/configs, input_path=./data/test/images/local/X-Ray-raw.n5, input_key=setup0/timepoint0/s0, output_path=./data/test/images/local/X-Ray-raw.n5, output_key=setup0/timepoint0/s1, scale_factor=(2, 2, 2), scale_prefix=s1, halo=[2, 2, 2], effective_scale_factor=[2, 2, 2], dependency=CopyVolumeLocal) is complete
INFO: Informed scheduler that task   DownscalingLocal_tmp_test_configs_DownscalingLocal__4__4__4__256b7017d5   has status   PENDING
DEBUG: Checking if CopyVolumeLocal(tmp_folder=tmp_test, max_jobs=16, config_dir=tmp_test/configs, input_path=/g/emcf/Hamburg_XRay/20200515/20x/TAH01_parz.h5, input_key=('t00000', 's00', '0'), output_path=./data/test/images/local/X-Ray-raw.n5, output_key=setup0/timepoint0/s0, prefix=initial_scale, dtype=None, fit_to_roi=False, effective_scale_factor=[], dependency=DummyTask) is complete
INFO: Informed scheduler that task   DownscalingLocal_tmp_test_configs_CopyVolumeLocal__2__2__2__e3414c594d   has status   PENDING
DEBUG: Checking if DummyTask() is complete
INFO: Informed scheduler that task   CopyVolumeLocal_tmp_test_configs_DummyTask_None_9670f53ac9   has status   PENDING
INFO: Informed scheduler that task   DummyTask__99914b932b   has status   DONE
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 9
INFO: [pid 28537] Worker Worker(salt=836893348, workers=1, host=vm-schwab-02.embl.de, username=schorb, pid=28537) running   CopyVolumeLocal(tmp_folder=tmp_test, max_jobs=16, config_dir=tmp_test/configs, input_path=/g/emcf/Hamburg_XRay/20200515/20x/TAH01_parz.h5, input_key=('t00000', 's00', '0'), output_path=./data/test/images/local/X-Ray-raw.n5, output_key=setup0/timepoint0/s0, prefix=initial_scale, dtype=None, fit_to_roi=False, effective_scale_factor=[], dependency=DummyTask)
ERROR: [pid 28537] Worker Worker(salt=836893348, workers=1, host=vm-schwab-02.embl.de, username=schorb, pid=28537) failed    CopyVolumeLocal(tmp_folder=tmp_test, max_jobs=16, config_dir=tmp_test/configs, input_path=/g/emcf/Hamburg_XRay/20200515/20x/TAH01_parz.h5, input_key=('t00000', 's00', '0'), output_path=./data/test/images/local/X-Ray-raw.n5, output_key=setup0/timepoint0/s0, prefix=initial_scale, dtype=None, fit_to_roi=False, effective_scale_factor=[], dependency=DummyTask)
Traceback (most recent call last):
  File "/g/emcf/common/5792_Sars-Cov-2/covid-em-datasets/software/miniconda/envs/covid-em-dev/lib/python3.7/site-packages/luigi/worker.py", line 191, in run
    new_deps = self._run_get_new_deps()
  File "/g/emcf/common/5792_Sars-Cov-2/covid-em-datasets/software/miniconda/envs/covid-em-dev/lib/python3.7/site-packages/luigi/worker.py", line 133, in _run_get_new_deps
    task_gen = self.task.run()
  File "/g/emcf/common/5792_Sars-Cov-2/covid-em-datasets/software/cluster_tools/cluster_tools/cluster_tasks.py", line 95, in run
    raise e
  File "/g/emcf/common/5792_Sars-Cov-2/covid-em-datasets/software/cluster_tools/cluster_tools/cluster_tasks.py", line 81, in run
    self.run_impl()
  File "/g/emcf/common/5792_Sars-Cov-2/covid-em-datasets/software/cluster_tools/cluster_tools/copy_volume/copy_volume.py", line 64, in run_impl
    ds = f[self.input_key]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/g/emcf/common/5792_Sars-Cov-2/covid-em-datasets/software/miniconda/envs/covid-em-dev/lib/python3.7/site-packages/h5py/_hl/group.py", line 264, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "/g/emcf/common/5792_Sars-Cov-2/covid-em-datasets/software/miniconda/envs/covid-em-dev/lib/python3.7/site-packages/h5py/_hl/base.py", line 137, in _e
    name = name.encode('ascii')
AttributeError: 'tuple' object has no attribute 'encode'
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   CopyVolumeLocal_tmp_test_configs_DummyTask_None_9670f53ac9   has status   FAILED
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
DEBUG: There are 9 pending tasks possibly being run by other workers
DEBUG: There are 9 pending tasks unique to this worker
DEBUG: There are 9 pending tasks last scheduled by this worker
INFO: Worker Worker(salt=836893348, workers=1, host=vm-schwab-02.embl.de, username=schorb, pid=28537) was stopped. Shutting down Keep-Alive thread
INFO: 
===== Luigi Execution Summary =====

Scheduled 10 tasks of which:
* 1 complete ones were encountered:
    - 1 DummyTask()
* 1 failed:
    - 1 CopyVolumeLocal(...)
* 8 were left pending, among these:
    * 8 had failed dependencies:
        - 6 DownscalingLocal(...)
        - 1 DownscalingWorkflow(...)
        - 1 WriteDownscalingMetadata(...)

This progress looks :( because there were failed tasks

===== Luigi Execution Summary =====

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-27-b2a49fb11710> in <module>
----> 1 initialize_dataset('/g/emcf/Hamburg_XRay/20200515/20x/TAH01_parz.h5',"./t00000/s00/0/cells",ROOT,'test','X-Ray-raw',[0.1,0.1,0.1],DEFAULT_CHUNKS,scale_factors,is_default=0,target='local',max_jobs=16)

/g/emcf/common/5792_Sars-Cov-2/covid-em-datasets/software/mobie-utils-python/mobie/initialization.py in initialize_dataset(input_path, input_key, root, dataset_name, raw_name, resolution, chunks, scale_factors, is_default, add_remote, tmp_folder, target, max_jobs, time_limit)
     70     import_raw_volume(input_path, input_key, data_path,
     71                       resolution, scale_factors, chunks,
---> 72                       tmp_folder=tmp_folder, target=target, max_jobs=max_jobs)
     73 
     74     add_to_image_dict(dataset_folder, 'image', xml_path, add_remote=add_remote)

/g/emcf/common/5792_Sars-Cov-2/covid-em-datasets/software/mobie-utils-python/mobie/import_data/raw.py in import_raw_volume(in_path, in_key, out_path, resolution, scale_factors, chunks, tmp_folder, target, max_jobs, block_shape)
     55     ret = luigi.build([t], local_scheduler=True)
     56     if not ret:
---> 57         raise RuntimeError("Importing raw data failed")

RuntimeError: Importing raw data failed
constantinpape commented 4 years ago

I cannot reproduce this, the following code works for me:

from mobie import initialize_dataset

ROOT = '.' 
path = '/g/emcf/Hamburg_XRay/20200515/20x/TAH01_parz.h5'
key = "/t00000/s00/0/cells"

scale_factors = [[2, 2, 2]] 

initialize_dataset(path, key, ROOT,
                   'test', 'X-Ray-raw',
                   resolution=[0.1, 0.1, 0.1],
                   chunks=(64, 64, 64),
                   scale_factors=scale_factors,
                   is_default=1, target='local', max_jobs=16)

Note that I had to fix something in the downscaling logic for this to work, so you will need to update cluster tools in your env: https://github.com/constantinpape/cluster_tools

constantinpape commented 4 years ago

@martinschorb

maybe there is an issue with the h5py version. In my env I have

h5py                      2.9.0           nompi_py37hf008753_1102    conda-forge                                                            
martinschorb commented 4 years ago

OK, runs now.

I just did a git pull in the main directory, but in order to get the proper cluster tools had to do it in that subdirectory as well...