horovod / horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
http://horovod.ai
Other
14.2k stars 2.23k forks source link

'Could not find _common_metadata file. Use materialize_dataset(..) in' #2313

Open WangHeguan opened 4 years ago

WangHeguan commented 4 years ago

Environment:

  1. Framework: TensorFlow,==2.3.0
  2. Framework version:
  3. Horovod version:0.20.0
  4. MPI version:
  5. CUDA version:
  6. NCCL version:
  7. Python version:3.7
  8. Spark / PySpark version:2.4.0
  9. OS and version:
  10. GCC version:
  11. CMake version:

Checklist:

  1. Did you search issues to find if somebody asked this question before?
  2. If your question is about hang, did you read this doc?
  3. If your question is about docker, did you read this doc?
  4. Did you check if you question is answered in the troubleshooting guide?

Bug report: [1,9]:Traceback (most recent call last): [1,9]: File "/usr/local/lib/python3.7/site-packages/petastorm/etl/dataset_metadata.py", line 407, in infer_or_load_unischema [1,9]: return get_schema(dataset) [1,9]: File "/usr/local/lib/python3.7/site-packages/petastorm/etl/dataset_metadata.py", line 361, in get_schema [1,9]: 'Could not find _common_metadata file. Use materialize_dataset(..) in' [1,9]:petastorm.etl.dataset_metadata.PetastormMetadataError: Could not find _common_metadata file. Use materialize_dataset(..) in petastorm.etl.dataset_metadata.py to generate this file in your ETL code. You can generate it on an existing dataset using petastorm-generate-metadata.py [1,9]: [1,9]:During handling of the above exception, another exception occurred: [1,9]: [1,9]:Traceback (most recent call last): [1,9]: File "/usr/local/Cellar/python@3.7/3.7.9/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main [1,9]: "main", mod_spec) [1,9]: File "/usr/local/Cellar/python@3.7/3.7.9/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code [1,9]: exec(code, run_globals) [1,9]: File "/usr/local/lib/python3.7/site-packages/horovod/spark/task/mpirun_exec_fn.py", line 52, in [1,9]: main(codec.loads_base64(sys.argv[1]), codec.loads_base64(sys.argv[2])) [1,9]: File "/usr/local/lib/python3.7/site-packages/horovod/spark/task/mpirun_exec_fn.py", line 45, in main [1,9]: [1,9]:task_exec(driver_addresses, settings, 'OMPI_COMM_WORLD_RANK', 'OMPI_COMM_WORLD_LOCAL_RANK')[1,9]: [1,9]: File "/usr/local/lib/python3.7/site-packages/horovod/spark/task/init.py", line 61, in task_exec [1,9]: result = fn(args, kwargs) [1,9]: File "/usr/local/lib/python3.7/site-packages/horovod/spark/keras/remote.py", line 208, in train [1,9]: reader_factory_kwargs) as train_reader: [1,9]: File "/usr/local/lib/python3.7/site-packages/petastorm/reader.py", line 300, in make_batch_reader [1,9]: filters=filters) [1,9]: File "/usr/local/lib/python3.7/site-packages/petastorm/reader.py", line 380, in init [1,9]: stored_schema = infer_or_load_unischema(self.dataset) [1,9]: File "/usr/local/lib/python3.7/site-packages/petastorm/etl/dataset_metadata.py", line 411, in infer_or_load_unischema [1,9]: return Unischema.from_arrow_schema(dataset) [1,9]: File "/usr/local/lib/python3.7/site-packages/petastorm/unischema.py", line 336, in from_arrow_schema [1,9]: arrow_field = compat_schema_field(arrow_schema, column_name) [1,9]: File "/usr/local/lib/python3.7/site-packages/petastorm/compat.py", line 80, in compat_schema_field [1,9]: return schema.field(name) [1,9]:AttributeError[1,9]:: 'pyarrow.lib.Schema' object has no attribute 'field' [1,0]:2020-09-22 21:22:14.085374: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session started. [1,0]:Traceback (most recent call last): [1,0]: File "/usr/local/lib/python3.7/site-packages/petastorm/etl/dataset_metadata.py", line 407, in infer_or_load_unischema [1,0]: return get_schema(dataset) [1,0]: File "/usr/local/lib/python3.7/site-packages/petastorm/etl/dataset_metadata.py", line 361, in get_schema [1,0]: 'Could not find _common_metadata file. Use materialize_dataset(..) in' [1,0]:petastorm.etl.dataset_metadata.PetastormMetadataError: Could not find _common_metadata file. Use materialize_dataset(..) in petastorm.etl.dataset_metadata.py to generate this file in your ETL code. You can generate it on an existing dataset using petastorm-generate-metadata.py [1,0]: [1,0]:During handling of the above exception, another exception occurred: [1,0]: [1,0]:Traceback (most recent call last): [1,0]: File "/usr/local/Cellar/python@3.7/3.7.9/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main [1,0]: "main", mod_spec) [1,0]: File "/usr/local/Cellar/python@3.7/3.7.9/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code [1,0]: exec(code, run_globals) [1,0]: File "/usr/local/lib/python3.7/site-packages/horovod/spark/task/mpirun_exec_fn.py", line 52, in [1,0]: main(codec.loads_base64(sys.argv[1]), codec.loads_base64(sys.argv[2])) [1,0]: File "/usr/local/lib/python3.7/site-packages/horovod/spark/task/mpirun_exec_fn.py", line 45, in main [1,0]: [1,0]:task_exec(driver_addresses, settings, 'OMPI_COMM_WORLD_RANK', 'OMPI_COMM_WORLD_LOCAL_RANK') [1,0]: File "/usr/local/lib/python3.7/site-packages/horovod/spark/task/init.py", line 61, in task_exec [1,0]: result = fn(args, kwargs) [1,0]: File "/usr/local/lib/python3.7/site-packages/horovod/spark/keras/remote.py", line 208, in train [1,0]: reader_factory_kwargs) as train_reader: [1,0]: File "/usr/local/lib/python3.7/site-packages/petastorm/reader.py", line 300, in make_batch_reader [1,0]: filters=filters) [1,0]: File "/usr/local/lib/python3.7/site-packages/petastorm/reader.py", line 380, in init [1,0]: stored_schema = infer_or_load_unischema(self.dataset) [1,0]: File "/usr/local/lib/python3.7/site-packages/petastorm/etl/dataset_metadata.py", line 411, in infer_or_load_unischema [1,0]: return Unischema.from_arrow_schema(dataset) [1,0]: File "/usr/local/lib/python3.7/site-packages/petastorm/unischema.py", line 336, in from_arrow_schema [1,0]: arrow_field = compat_schema_field(arrow_schema, column_name)[1,0]: [1,0]: File "/usr/local/lib/python3.7/site-packages/petastorm/compat.py", line 80, in compat_schema_field [1,0]: return schema.field(name) [1,0]:AttributeError: 'pyarrow.lib.Schema' object has no attribute 'field' [1,4]:Traceback (most recent call last): [1,4]: File "/usr/local/lib/python3.7/site-packages/petastorm/etl/dataset_metadata.py", line 407, in infer_or_load_unischema [1,4]: return get_schema(dataset) [1,4]: File "/usr/local/lib/python3.7/site-packages/petastorm/etl/dataset_metadata.py", line 361, in get_schema [1,4]: 'Could not find _common_metadata file. Use materialize_dataset(..) in' [1,4]:petastorm.etl.dataset_metadata.PetastormMetadataError[1,4]:: Could not find _common_metadata file. Use materialize_dataset(..) in petastorm.etl.dataset_metadata.py to generate this file in your ETL code. You can generate it on an existing dataset using petastorm-generate-metadata.py [1,4]: [1,4]:During handling of the above exception, another exception occurred: [1,4]: [1,4]:Traceback (most recent call last): [1,4]: File "/usr/local/Cellar/python@3.7/3.7.9/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main [1,4]: "main", mod_spec) [1,4]: File "/usr/local/Cellar/python@3.7/3.7.9/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code [1,4]: exec(code, run_globals) [1,4]: File "/usr/local/lib/python3.7/site-packages/horovod/spark/task/mpirun_exec_fn.py", line 52, in [1,4]: main(codec.loads_base64(sys.argv[1]), codec.loads_base64(sys.argv[2])) [1,4]: File "/usr/local/lib/python3.7/site-packages/horovod/spark/task/mpirun_exec_fn.py", line 45, in main [1,4]: task_exec(driver_addresses, settings, 'OMPI_COMM_WORLD_RANK', 'OMPI_COMM_WORLD_LOCAL_RANK') [1,4]: File "/usr/local/lib/python3.7/site-packages/horovod/spark/task/init.py", line 61, in task_exec [1,4]: result = fn(args, kwargs) [1,4]: File "/usr/local/lib/python3.7/site-packages/horovod/spark/keras/remote.py", line 208, in train [1,4]: reader_factory_kwargs) as train_reader: [1,4]: File "/usr/local/lib/python3.7/site-packages/petastorm/reader.py", line 300, in make_batch_reader [1,4]: filters=filters) [1,4]: File "/usr/local/lib/python3.7/site-packages/petastorm/reader.py", line 380, in init [1,4]: stored_schema = infer_or_load_unischema(self.dataset) [1,4]: File "/usr/local/lib/python3.7/site-packages/petastorm/etl/dataset_metadata.py", line 411, in infer_or_load_unischema [1,4]: return Unischema.from_arrow_schema(dataset) [1,4]: File "/usr/local/lib/python3.7/site-packages/petastorm/unischema.py", line 336, in from_arrow_schema [1,4]: arrow_field = compat_schema_field(arrow_schema, column_name) [1,4]: File "/usr/local/lib/python3.7/site-packages/petastorm/compat.py", line 80, in compat_schema_field [1,4]: return schema.field(name) [1,4]:AttributeError: 'pyarrow.lib.Schema' object has no attribute 'field' [1,2]:Traceback (most recent call last): [1,2]: File "/usr/local/lib/python3.7/site-packages/petastorm/etl/dataset_metadata.py", line 407, in infer_or_load_unischema [1,2]: [1,2]:return get_schema(dataset)[1,2]: [1,2]: File "/usr/local/lib/python3.7/site-packages/petastorm/etl/dataset_metadata.py", line 361, in get_schema [1,2]: [1,2]:'Could not find _common_metadata file. Use materialize_dataset(..) in'[1,2]: [1,2]:petastorm.etl.dataset_metadata.PetastormMetadataError[1,2]:: Could not find _common_metadata file. Use materialize_dataset(..) in petastorm.etl.dataset_metadata.py to generate this file in your ETL code. You can generate it on an existing dataset using petastorm-generate-metadata.py [1,2]: [1,2]:During handling of the above exception, another exception occurred: [1,2]: [1,2]:Traceback (most recent call last): [1,2]: File "/usr/local/Cellar/python@3.7/3.7.9/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main [1,2]: "main", mod_spec) [1,2]: File "/usr/local/Cellar/python@3.7/3.7.9/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code [1,2]: exec(code, run_globals) [1,2]: File "/usr/local/lib/python3.7/site-packages/horovod/spark/task/mpirun_exec_fn.py", line 52, in [1,2]: main(codec.loads_base64(sys.argv[1]), codec.loads_base64(sys.argv[2])) [1,2]: File "/usr/local/lib/python3.7/site-packages/horovod/spark/task/mpirun_exec_fn.py", line 45, in main [1,2]: task_exec(driver_addresses, settings, 'OMPI_COMM_WORLD_RANK', 'OMPI_COMM_WORLD_LOCAL_RANK') [1,2]: File "/usr/local/lib/python3.7/site-packages/horovod/spark/task/init.py", line 61, in task_exec [1,2]: result = fn(args, kwargs) [1,2]: File "/usr/local/lib/python3.7/site-packages/horovod/spark/keras/remote.py", line 208, in train [1,2]: reader_factory_kwargs) as train_reader: [1,2]: File "/usr/local/lib/python3.7/site-packages/petastorm/reader.py", line 300, in make_batch_reader [1,2]: filters=filters) [1,2]: File "/usr/local/lib/python3.7/site-packages/petastorm/reader.py", line 380, in init [1,2]: stored_schema = infer_or_load_unischema(self.dataset) [1,2]: File "/usr/local/lib/python3.7/site-packages/petastorm/etl/dataset_metadata.py", line 411, in infer_or_load_unischema [1,2]: return Unischema.from_arrow_schema(dataset) [1,2]: File "/usr/local/lib/python3.7/site-packages/petastorm/unischema.py", line 336, in from_arrow_schema [1,2]: arrow_field = compat_schema_field(arrow_schema, column_name) [1,2]: File "/usr/local/lib/python3.7/site-packages/petastorm/compat.py", line 80, in compat_schema_field [1,2]: return schema.field(name) [1,2]:AttributeError: 'pyarrow.lib.Schema' object has no attribute 'field' [1,8]:Traceback (most recent call last): [1,8]: File "/usr/local/lib/python3.7/site-packages/petastorm/etl/dataset_metadata.py", line 407, in infer_or_load_unischema [1,8]: return get_schema(dataset) [1,8]: File "/usr/local/lib/python3.7/site-packages/petastorm/etl/dataset_metadata.py", line 361, in get_schema

tgaddair commented 4 years ago

Hey @WangHeguan, can you tell me a little more about what you're doing? Are you able to successfully run one of our example, like keras_spark_mnist.py?

WangHeguan commented 4 years ago

@tgaddair I just run keras_spark_mnist.py then this error happend