tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.31k stars 1.55k forks source link

NotImplementedError: While importing/Loading tfds plant_leaves dataset #5416

Open Coolcoder45 opened 6 months ago

Coolcoder45 commented 6 months ago

/!\ PLEASE INCLUDE THE FULL STACKTRACE AND CODE SNIPPET

Short description tfds plant_leaves is not getting loaded successfully. It's throwing NotImplementedError. Tried on May 16, 2024

Environment information

Reproduction instructions

import tensorflow_datasets as tfds
plant_leaves = tfds.load('plant_leaves', split='train', shuffle_files=True)

Gives:

Downloading and preparing dataset 6.56 GiB (download: 6.56 GiB, generated: 6.81 GiB, total: 13.37 GiB) to /root/tensorflow_datasets/plant_leaves/0.1.1...
Dl Completed...: 100%
 1/1 [10:04<00:00, 604.39s/ url]
Dl Size...: 100%
 6718/6718 [10:04<00:00, 11.25 MiB/s]
Dataset plant_leaves downloaded and prepared to /root/tensorflow_datasets/plant_leaves/0.1.1. Subsequent calls will reuse this data.
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
[<ipython-input-3-d88d46497437>](https://localhost:8080/#) in <cell line: 2>()
      1 import tensorflow_datasets as tfds
----> 2 plant_leaves = tfds.load('plant_leaves', split='train', shuffle_files=True)

33 frames
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/file_adapters.py](https://localhost:8080/#) in make_tf_data(cls, filename, buffer_size)
    206   ) -> tf.data.Dataset:
    207     """Returns TensorFlow Dataset comprising given array record file."""
--> 208     raise NotImplementedError(
    209         '`.as_dataset()` not implemented for ArrayRecord files. Please, use'
    210         ' `.as_data_source()`.'

NotImplementedError: `.as_dataset()` not implemented for ArrayRecord files. Please, use `.as_data_source()`.

Expected behavior To load dataset successfully.

pierrot0 commented 6 months ago

Hi, thank you for reporting! This is definitely a bug.

Workaround: add the following arg to your tfds.load call:

tfds.load(..., download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD})

We'll look on how to update the code and update on the bug.

Coolcoder45 commented 6 months ago

It's still giving error.

import tensorflow_datasets as `tfds`
plant_leaves_data, plant_leaves_info = tfds.load('plant_leaves', split='train', shuffle_files=True, download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD})

Gives

Downloading and preparing dataset 6.56 GiB (download: 6.56 GiB, generated: 6.81 GiB, total: 13.37 GiB) to /root/tensorflow_datasets/plant_leaves/0.1.1...
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-3-608b46b22c6c>](https://localhost:8080/#) in <cell line: 4>()
      2 #plant_leaves = tfds.load('plant_leaves', split='train', shuffle_files=True)
      3 #plant_leaves_data, plant_leaves_info = tfds.load('plant_leaves', split='train', shuffle_files=True, as_data_source=True)
----> 4 plant_leaves_data, plant_leaves_info = tfds.load('plant_leaves', split='train', shuffle_files=True, download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD})

5 frames
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/logging/__init__.py](https://localhost:8080/#) in __call__(self, function, instance, args, kwargs)
    167     metadata = self._start_call()
    168     try:
--> 169       return function(*args, **kwargs)
    170     except Exception:
    171       metadata.mark_error()

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/load.py](https://localhost:8080/#) in load(name, split, data_dir, batch_size, shuffle_files, download, as_supervised, decoders, read_config, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs)
    645       try_gcs,
    646   )
--> 647   _download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs)
    648 
    649   if as_dataset_kwargs is None:

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/load.py](https://localhost:8080/#) in _download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs)
    504   if download:
    505     download_and_prepare_kwargs = download_and_prepare_kwargs or {}
--> 506     dbuilder.download_and_prepare(**download_and_prepare_kwargs)
    507 
    508 

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/logging/__init__.py](https://localhost:8080/#) in __call__(self, function, instance, args, kwargs)
    167     metadata = self._start_call()
    168     try:
--> 169       return function(*args, **kwargs)
    170     except Exception:
    171       metadata.mark_error()

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/dataset_builder.py](https://localhost:8080/#) in download_and_prepare(self, download_dir, download_config, file_format)
    679     # to generate the files.
    680     if file_format:
--> 681       self.info.set_file_format(file_format, override=True)
    682 
    683     # Create a tmp dir and rename to self.data_dir on successful exit.

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/dataset_info.py](https://localhost:8080/#) in set_file_format(self, file_format, override)
    470       )
    471     if override and self._fully_initialized:
--> 472       raise RuntimeError(
    473           "Cannot override the file format "
    474           "when the DatasetInfo is already fully initialized!"

RuntimeError: Cannot override the file format when the DatasetInfo is already fully initialized!
dddraxxx commented 4 months ago

Same errors on refcoco dataset. NotImplementedError: `.as_dataset()` not implemented for ArrayRecord files. Please, use `.as_data_source()`.

dddraxxx commented 4 months ago

Anyway, one thing I do to solve this is add the following line:

builder = tfds.builder('ref_coco/refcocog_umd')
builder.info.set_file_format(tfds.core.FileFormat.PARQUET, override=True, override_if_initialized=True)
builder.download_and_prepare()
ref_ds = tfds.load('ref_coco/refcocog_umd', split='validation')
Dmitry-Danchenko commented 1 week ago

builder = tfds.builder('oxford_iiit_pet') builder.info.set_file_format(tfds.core.FileFormat.PARQUET, override=True, override_if_initialized=True) builder.download_and_prepare()

dataset, info = tfds.load('oxford_iiit_pet:4.0.0', download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD})

also erroers:

`

NotImplementedError Traceback (most recent call last) Cell In[34], line 5 2 builder.info.set_file_format(tfds.core.FileFormat.PARQUET, override=True, override_if_initialized=True) 3 builder.download_and_prepare() ----> 5 dataset, info = tfds.load('oxford_iiit_pet:4.0.0', download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD})

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/logging/init.py:176, in _FunctionDecorator.call(self, function, instance, args, kwargs) 174 metadata = self._start_call() 175 try: --> 176 return function(*args, **kwargs) 177 except Exception: 178 metadata.mark_error()

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/load.py:673, in load(name, split, data_dir, batch_size, shuffle_files, download, as_supervised, decoders, read_config, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs) 670 as_dataset_kwargs.setdefault('shuffle_files', shuffle_files) 671 as_dataset_kwargs.setdefault('read_config', read_config) --> 673 ds = dbuilder.as_dataset(**as_dataset_kwargs) 674 if with_info: 675 return ds, dbuilder.info

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/logging/init.py:176, in _FunctionDecorator.call(self, function, instance, args, kwargs) 174 metadata = self._start_call() 175 try: --> 176 return function(*args, **kwargs) 177 except Exception: 178 metadata.mark_error()

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/dataset_builder.py:1026, in DatasetBuilder.as_dataset(self, split, batch_size, shuffle_files, decoders, read_config, as_supervised) 1017 # Create a dataset for each of the given splits 1018 build_single_dataset = functools.partial( 1019 self._build_single_dataset, 1020 shuffle_files=shuffle_files, (...) 1024 as_supervised=as_supervised, 1025 ) -> 1026 all_ds = tree.map_structure(build_single_dataset, split) 1027 return all_ds

File /usr/local/lib/python3.12/dist-packages/tree/init.py:428, in map_structure(func, *structures, *kwargs) 425 for other in structures[1:]: 426 assert_same_structure(structures[0], other, check_types=check_types) 427 return unflatten_as(structures[0], --> 428 [func(args) for args in zip(*map(flatten, structures))])

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/dataset_builder.py:1044, in DatasetBuilder._build_single_dataset(self, split, batch_size, shuffle_files, decoders, read_config, as_supervised) 1041 batch_size = self.info.splits.total_num_examples or sys.maxsize 1043 # Build base dataset -> 1044 ds = self._as_dataset( 1045 split=split, 1046 shuffle_files=shuffle_files, 1047 decoders=decoders, 1048 read_config=read_config, 1049 ) 1050 # Auto-cache small datasets which are small enough to fit in memory. 1051 if self._should_cache_ds( 1052 split=split, shuffle_files=shuffle_files, read_config=read_config 1053 ):

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/dataset_builder.py:1498, in FileReaderBuilder._as_dataset(self, split, decoders, read_config, shuffle_files) 1492 reader = reader_lib.Reader( 1493 self.data_dir, 1494 example_specs=example_specs, 1495 file_format=self.info.file_format, 1496 ) 1497 decode_fn = functools.partial(features.decode_example, decoders=decoders) -> 1498 return reader.read( 1499 instructions=split, 1500 split_infos=self.info.splits.values(), 1501 decode_fn=decode_fn, 1502 read_config=read_config, 1503 shuffle_files=shuffle_files, 1504 disable_shuffling=self.info.disable_shuffling, 1505 )

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/reader.py:430, in Reader.read(self, instructions, split_infos, read_config, shuffle_files, disable_shuffling, decode_fn) 421 file_instructions = splits_dict[instruction].file_instructions 422 return self.read_files( 423 file_instructions, 424 read_config=read_config, (...) 427 decode_fn=decode_fn, 428 ) --> 430 return tree.map_structure(_read_instruction_to_ds, instructions)

File /usr/local/lib/python3.12/dist-packages/tree/init.py:428, in map_structure(func, *structures, *kwargs) 425 for other in structures[1:]: 426 assert_same_structure(structures[0], other, check_types=check_types) 427 return unflatten_as(structures[0], --> 428 [func(args) for args in zip(*map(flatten, structures))])

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/reader.py:422, in Reader.read.._read_instruction_to_ds(instruction) 420 def _read_instruction_to_ds(instruction): 421 file_instructions = splits_dict[instruction].file_instructions --> 422 return self.read_files( 423 file_instructions, 424 read_config=read_config, 425 shuffle_files=shuffle_files, 426 disable_shuffling=disable_shuffling, 427 decode_fn=decode_fn, 428 )

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/reader.py:462, in Reader.read_files(self, file_instructions, read_config, shuffle_files, disable_shuffling, decode_fn) 459 raise ValueError(msg) 461 # Read serialized example (eventually with tfds_id) --> 462 ds = _read_files( 463 file_instructions=file_instructions, 464 read_config=read_config, 465 shuffle_files=shuffle_files, 466 disable_shuffling=disable_shuffling, 467 file_format=self._file_format, 468 ) 470 # Parse and decode 471 def parse_and_decode(ex: Tensor) -> TreeDict[Tensor]: 472 # TODO(pierrot): parse_example uses 473 # tf.io.parse_single_example. It might be faster to use parse_example, 474 # after batching. 475 # https://www.tensorflow.org/api_docs/python/tf/io/parse_example

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/reader.py:302, in _read_files(file_instructions, read_config, shuffle_files, disable_shuffling, file_format) 295 if ( 296 shuffle_files 297 and read_config.shuffle_seed is None 298 and tf_compat.get_option_deterministic(read_config.options) is None 299 ): 300 deterministic = False --> 302 ds = instruction_ds.interleave( 303 functools.partial( 304 _get_dataset_from_filename, 305 do_skip=do_skip, 306 do_take=do_take, 307 file_format=file_format, 308 add_tfds_id=read_config.add_tfds_id, 309 override_buffer_size=read_config.override_buffer_size, 310 ), 311 cycle_length=cycle_length, 312 block_length=block_length, 313 num_parallel_calls=read_config.num_parallel_calls_for_interleave_files, 314 deterministic=deterministic, 315 ) 317 return assert_cardinality_and_apply_options(ds)

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/dataset_ops.py:2534, in DatasetV2.interleave(self, map_func, cycle_length, block_length, num_parallel_calls, deterministic, name) 2530 # Loaded lazily due to a circular dependency ( 2531 # dataset_ops -> interleave_op -> dataset_ops). 2532 # pylint: disable=g-import-not-at-top,protected-access 2533 from tensorflow.python.data.ops import interleave_op -> 2534 return interleave_op._interleave(self, map_func, cycle_length, block_length, 2535 num_parallel_calls, deterministic, name)

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/interleave_op.py:49, in _interleave(input_dataset, map_func, cycle_length, block_length, num_parallel_calls, deterministic, name) 46 return _InterleaveDataset( 47 input_dataset, map_func, cycle_length, block_length, name=name) 48 else: ---> 49 return _ParallelInterleaveDataset( 50 input_dataset, 51 map_func, 52 cycle_length, 53 block_length, 54 num_parallel_calls, 55 deterministic=deterministic, 56 name=name)

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/interleave_op.py:119, in _ParallelInterleaveDataset.init(self, input_dataset, map_func, cycle_length, block_length, num_parallel_calls, buffer_output_elements, prefetch_input_elements, deterministic, name) 117 """See Dataset.interleave() for details.""" 118 self._input_dataset = input_dataset --> 119 self._map_func = structured_function.StructuredFunctionWrapper( 120 map_func, self._transformation_name(), dataset=input_dataset) 121 if not isinstance(self._map_func.output_structure, dataset_ops.DatasetSpec): 122 raise TypeError( 123 "The map_func argument must return a Dataset object. Got " 124 f"{dataset_ops.get_type(self._map_func.output_structure)!r}.")

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/structured_function.py:265, in StructuredFunctionWrapper.init(self, func, transformation_name, dataset, input_classes, input_shapes, input_types, input_structure, add_to_graph, use_legacy_function, defun_kwargs) 258 warnings.warn( 259 "Even though the tf.config.experimental_run_functions_eagerly " 260 "option is set, this option does not apply to tf.data functions. " 261 "To force eager execution of tf.data functions, please use " 262 "tf.data.experimental.enable_debug_mode().") 263 fn_factory = trace_tf_function(defun_kwargs) --> 265 self._function = fn_factory() 266 # There is no graph to add in eager mode. 267 add_to_graph &= not context.executing_eagerly()

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:1251, in Function.get_concrete_function(self, *args, kwargs) 1249 def get_concrete_function(self, *args, *kwargs): 1250 # Implements PolymorphicFunction.get_concrete_function. -> 1251 concrete = self._get_concrete_function_garbage_collected(args, kwargs) 1252 concrete._garbage_collector.release() # pylint: disable=protected-access 1253 return concrete

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:1221, in Function._get_concrete_function_garbage_collected(self, *args, **kwargs) 1219 if self._variable_creation_config is None: 1220 initializers = [] -> 1221 self._initialize(args, kwargs, add_initializers_to=initializers) 1222 self._initialize_uninitialized_variables(initializers) 1224 if self._created_variables: 1225 # In this case we have created variables on the first call, so we run the 1226 # version which is guaranteed to never create variables.

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:696, in Function._initialize(self, args, kwds, add_initializers_to) 691 self._variable_creation_config = self._generate_scoped_tracing_options( 692 variable_capturing_scope, 693 tracing_compilation.ScopeType.VARIABLE_CREATION, 694 ) 695 # Force the definition of the function for these arguments --> 696 self._concrete_variable_creation_fn = tracing_compilation.trace_function( 697 args, kwds, self._variable_creation_config 698 ) 700 def invalid_creator_scope(*unused_args, **unused_kwds): 701 """Disables variable creation."""

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/tracing_compilation.py:178, in trace_function(args, kwargs, tracing_options) 175 args = tracing_options.input_signature 176 kwargs = {} --> 178 concrete_function = _maybe_define_function( 179 args, kwargs, tracing_options 180 ) 182 if not tracing_options.bind_graph_to_function: 183 concrete_function._garbage_collector.release() # pylint: disable=protected-access

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/tracing_compilation.py:283, in _maybe_define_function(args, kwargs, tracing_options) 281 else: 282 target_func_type = lookup_func_type --> 283 concrete_function = _create_concrete_function( 284 target_func_type, lookup_func_context, func_graph, tracing_options 285 ) 287 if tracing_options.function_cache is not None: 288 tracing_options.function_cache.add( 289 concrete_function, current_func_context 290 )

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/tracing_compilation.py:310, in _create_concrete_function(function_type, type_context, func_graph, tracing_options) 303 placeholder_bound_args = function_type.placeholder_arguments( 304 placeholder_context 305 ) 307 disable_acd = tracing_options.attributes and tracing_options.attributes.get( 308 attributes_lib.DISABLE_ACD, False 309 ) --> 310 traced_func_graph = func_graph_module.func_graph_from_py_func( 311 tracing_options.name, 312 tracing_options.python_function, 313 placeholder_bound_args.args, 314 placeholder_bound_args.kwargs, 315 None, 316 func_graph=func_graph, 317 add_control_dependencies=not disable_acd, 318 arg_names=function_type_utils.to_arg_names(function_type), 319 create_placeholders=False, 320 ) 322 transform.apply_func_graph_transforms(traced_func_graph) 324 graph_capture_container = traced_func_graph.function_captures

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/framework/func_graph.py:1059, in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, createplaceholders) 1056 return x 1058 , original_func = tf_decorator.unwrap(python_func) -> 1059 func_outputs = python_func(*func_args, **func_kwargs) 1061 # invariant: func_outputs contains only Tensors, CompositeTensors, 1062 # TensorArrays and Nones. 1063 func_outputs = variable_utils.convert_variables_to_tensors(func_outputs)

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:599, in Function._generate_scoped_tracing_options..wrapped_fn(*args, *kwds) 595 with default_graph._variable_creator_scope(scope, priority=50): # pylint: disable=protected-access 596 # wrapped allows AutoGraph to swap in a converted function. We give 597 # the function a weak reference to itself to avoid a reference cycle. 598 with OptionalXlaContext(compile_with_xla): --> 599 out = weak_wrapped_fn().wrapped(args, **kwds) 600 return out

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/structured_function.py:231, in StructuredFunctionWrapper.init..trace_tf_function..wrapped_fn(args) 230 def wrapped_fn(args): # pylint: disable=missing-docstring --> 231 ret = wrapper_helper(*args) 232 ret = structure.to_tensor_list(self._output_structure, ret) 233 return [ops.convert_to_tensor(t) for t in ret]

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/data/ops/structured_function.py:161, in StructuredFunctionWrapper.init..wrapper_helper(args) 159 if not _should_unpack(nested_args): 160 nested_args = (nested_args,) --> 161 ret = autograph.tf_convert(self._func, ag_ctx)(nested_args) 162 ret = variable_utils.convert_variables_to_tensors(ret) 163 if _should_pack(ret):

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/autograph/impl/api.py:690, in convert..decorator..wrapper(*args, **kwargs) 688 try: 689 with conversion_ctx: --> 690 return converted_call(f, args, kwargs, options=options) 691 except Exception as e: # pylint:disable=broad-except 692 if hasattr(e, 'ag_error_metadata'):

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/autograph/impl/api.py:352, in converted_call(f, args, kwargs, caller_fn_scope, options) 349 new_args = f.args + args 350 logging.log(3, 'Forwarding call of partial %s with\n%s\n%s\n', f, new_args, 351 new_kwargs) --> 352 return converted_call( 353 f.func, 354 new_args, 355 new_kwargs, 356 caller_fn_scope=caller_fn_scope, 357 options=options) 359 if inspect_utils.isbuiltin(f): 360 if f is eval:

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/autograph/impl/api.py:331, in converted_call(f, args, kwargs, caller_fn_scope, options) 329 if conversion.is_in_allowlist_cache(f, options): 330 logging.log(2, 'Allowlisted %s: from cache', f) --> 331 return _call_unconverted(f, args, kwargs, options, False) 333 if ag_ctx.control_status_ctx().status == ag_ctx.Status.DISABLED: 334 logging.log(2, 'Allowlisted: %s: AutoGraph is disabled in context', f)

File /usr/local/lib/python3.12/dist-packages/tensorflow/python/autograph/impl/api.py:459, in _call_unconverted(f, args, kwargs, options, update_cache) 456 return f.self.call(args, kwargs) 458 if kwargs is not None: --> 459 return f(*args, *kwargs) 460 return f(args)

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/reader.py:69, in _get_dataset_from_filename(instruction, do_skip, do_take, file_format, add_tfds_id, override_buffer_size) 60 def _get_dataset_from_filename( 61 instruction: _Instruction, 62 do_skip: bool, (...) 66 override_buffer_size: Optional[int] = None, 67 ) -> tf.data.Dataset: 68 """Returns a tf.data.Dataset instance from given instructions.""" ---> 69 ds = file_adapters.ADAPTER_FOR_FORMAT[file_format].make_tf_data( 70 instruction.filepath, buffer_size=override_buffer_size 71 ) 72 if do_skip: 73 ds = ds.skip(instruction.skip)

File /usr/local/lib/python3.12/dist-packages/tensorflow_datasets/core/file_adapters.py:267, in ArrayRecordFileAdapter.make_tf_data(cls, filename, buffer_size) 260 @classmethod 261 def make_tf_data( 262 cls, 263 filename: epath.PathLike, 264 buffer_size: int | None = None, 265 ) -> tf.data.Dataset: 266 """Returns TensorFlow Dataset comprising given array record file.""" --> 267 raise NotImplementedError( 268 '.as_dataset() not implemented for ArrayRecord files. Please, use' 269 ' .as_data_source().' 270 )

NotImplementedError: .as_dataset() not implemented for ArrayRecord files. Please, use .as_data_source(). `

pierrot0 commented 4 days ago

Can you try with the following instead?

builder = tfds.builder('oxford_iiit_pet') builder.info.set_file_format(tfds.core.FileFormat.PARQUET, override=True, override_if_initialized=True) builder.download_and_prepare()

dataset, info = tfds.load('oxford_iiit_pet:4.0.0', download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.PARQUET})

dsaha21 commented 31 minutes ago

Hi @pierrot0, I tried on both my 1. local system and 2. Colab. I used the PARQUET format like u mentioned. Getting something like the following :

colab_unet


I also tried to implement using only build

colab_onlybuilder1

using builder.as_data_source() is giving us the result

{'train': ArrayRecordDataSource(name=oxford_iiit_pet, split='train', decoders=None),
 'test': ArrayRecordDataSource(name=oxford_iiit_pet, split='test', decoders=None)}