tensorflow / models

Models and examples built with TensorFlow
Other
77.16k stars 45.76k forks source link

Cannot load UCF101 and Kinetics data sets for Pretraining video_ssl (for the paper Spatiotemporal Contrastive Video Representation Learning) #10587

Closed alpargun closed 1 year ago

alpargun commented 2 years ago

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/official/projects/video_ssl

2. Describe the bug and steps to reproduce

Following the Spatiotemporal Contrastive Video Representation Learning paper, I am trying to pretrain the Resnet3D backbone using UCF101 and Kinetics data set. I downloaded and prepared the data set and made sure it is ready by running the command: python train.py --experiment=video_classification_ucf101 --config_file=configs/experiments/cvrl_pretrain_k600_200ep.yaml --mode=train_and_eval --model_dir=temp/

The experiment "video_classification_ucf101" is registered in the directory _official/beta/configs/videoclassification.py, and I am able to successfuly finish the training with a high accuracy.

Then, I moved on to running the video_ssl model. I found that the relevant experiment flags for the video_ssl model are registered in the file _official/vision/beta/projects/video_ssl/configs/videossl.py. However, there are not any registries for experiments with UCF101 data set but only with Kinetics data sets. So, I ran Pretraining for Kinetics400 data set as: python train.py --experiment=video_ssl_pretrain_kinetics400 --config_file=configs/experiments/cvrl_pretrain_k600_200ep.yaml --mode=train_and_eval --model_dir=temp/

and got the error message:

_File "/home/alpargun/Desktop/models-2.8.0/official/vision/beta/projects/video_ssl/tasks/pretrain.py", line 62, in build_inputs reader = input_reader.InputReader( File "/home/alpargun/Desktop/models-2.8.0/official/core/input_reader.py", line 244, in init raise ValueError( ValueError: tfds_name is , but tfds_split is not specified. In call to configurable 'Trainer' (<class 'official.core.base_trainer.Trainer'>) In call to configurable 'create_trainer' (<function createtrainer at 0x7ff6f8f99700>)

which I believe is because Kinetics data sets are not registered to tfds (Tensorflow Datasets). This is another issue that should be opened since the paper focuses on trainings with Kinetics data sets, however, as I already prepared the UCF101 data set and tested it with other video classification experiments, I continued with UCF101.

In the file _official/vision/beta/projects/video_ssl/configs/videossl.py there is no experiment registry for UCF101, hence I registered it myself as:

@exp_factory.register_config_factory('video_ssl_pretrain_ucf101')
def video_ssl_pretrain_ucf101() -> cfg.ExperimentConfig:
  """Pretrain SSL Video classification on UCF101 with resnet."""
  exp = video_classification.video_classification_ucf101()
  exp.task = VideoSSLPretrainTask(**exp.task.as_dict())
  exp.task.train_data = DataConfig(is_ssl=True, **exp.task.train_data.as_dict())
  exp.task.train_data.feature_shape = (16, 224, 224, 3)
  exp.task.train_data.temporal_stride = 2
  exp.task.model = VideoSSLModel(exp.task.model)
  exp.task.model.model_type = 'video_ssl_model'
  exp.task.losses = SSLLosses(exp.task.losses)
  return exp

and ran the Pretraining using: python train.py --experiment=video_ssl_pretrain_ucf101 --config_file=configs/experiments/cvrl_pretrain_k600_200ep.yaml --mode=train_and_eval --model_dir=temp/

However, now I see the error:

TypeError: in user code:

File "/home/alpargun/Desktop/models-2.8.0/official/vision/beta/dataloaders/video_input.py", line 227, in decode  *
    context, sequences = tf.io.parse_single_sequence_example(

TypeError: Expected any non-tensor type, but got a tensor instead.

In call to configurable 'Trainer' (<class 'official.core.base_trainer.Trainer'>) In call to configurable 'create_trainer' (<function create_trainer at 0x7ff1074d2550>)

At this point I have 2 questions:

  1. How can I load the UCF101 and Kinetics data sets? UCF101 works with the experiment flag _'video_classificationucf101', however, returns a TypeError with the experiment flag _'video_ssl_pretrainucf101'. Additionally, Kinetics data sets are not included with the tfds, however, the dataloaders require a tfds format. Hence, could you please include documentation on preparing Kinetics400, Kinetics600, and Kinetics700 data sets for running video_ssl?

  2. For my own project, I only need to train the 3D-Resnet backbone using Self-Supervised Learning, in which I will use my own data set with no labels, and I will implement my own head for another task than video classification. After studying the paper and the code, I understood that I only need to run Pretraining and set the boolean _"isssl=True". Could you please confirm that this is how to perform self-supervised learning which does not rely on a video classification task and its labels?

4. Expected behavior

Run the video_ssl model and pretrain the 3D-Resnet backbone with UCF101 and Kinetics data sets.

5. Additional context

This is the complete error log when I pretrain video_ssl on UCF101:

_(env-ssl-release) alpargun@aperol3-System-Product-Name:~/Desktop/models-2.8.0/official/vision/beta/projects/video_ssl$ python train.py --model_dir='results/ssl' 2022-04-09 23:09:47.312356: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:47.312757: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:47.317440: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:47.317853: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:47.318313: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:47.318771: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I0409 23:09:47.331528 140675105997632 train_utils.py:335] Final experiment parameters: {'runtime': {'all_reduce_alg': None, 'batchnorm_spatial_persistent': False, 'dataset_num_private_threads': None, 'default_shard_dim': -1, 'distribution_strategy': 'mirrored', 'enable_xla': False, 'gpu_thread_mode': None, 'loss_scale': None, 'mixed_precision_dtype': 'float32', 'num_cores_per_replica': 1, 'num_gpus': 2, 'num_packs': 1, 'per_gpu_thread_count': 0, 'run_eagerly': False, 'task_index': -1, 'tpu': None, 'tpu_enable_xla_dynamic_padder': None, 'worker_hosts': None}, 'task': {'eval_input_partition_dims': None, 'init_checkpoint': 'temp2', 'init_checkpoint_modules': 'all', 'losses': {'l2_weight_decay': 1e-06, 'label_smoothing': 0.0, 'normalize_hidden': True, 'one_hot': True, 'temperature': 0.1}, 'metrics': {'use_per_class_recall': False}, 'model': {'aggregate_endpoints': False, 'backbone': {'resnet_3d': {'block_specs': ({'temporal_kernel_sizes': (1, 1, 1), 'temporal_strides': 1, 'use_self_gating': False}, {'temporal_kernel_sizes': (1, 1, 1, 1), 'temporal_strides': 1, 'use_self_gating': False}, {'temporal_kernel_sizes': (3, 3, 3, 3, 3, 3), 'temporal_strides': 1, 'use_self_gating': False}, {'temporal_kernel_sizes': (3, 3, 3), 'temporal_strides': 1, 'use_self_gating': False}), 'model_id': 50, 'se_ratio': 0.0, 'stem_conv_temporal_kernel_size': 5, 'stem_conv_temporal_stride': 2, 'stem_pool_temporal_stride': 1, 'stem_type': 'v0', 'stochastic_depth_drop_rate': 0.0}, 'type': 'resnet_3d'}, 'dropout_rate': 0.5, 'hidden_dim': 2048, 'hidden_layer_num': 3, 'hidden_norm_activation': {'activation': 'relu', 'norm_epsilon': 1e-05, 'norm_momentum': 0.997, 'use_sync_bn': True}, 'model_type': 'video_ssl_model', 'norm_activation': {'activation': 'relu', 'norm_epsilon': 1e-05, 'norm_momentum': 0.9, 'use_sync_bn': True}, 'normalize_feature': False, 'projection_dim': 128, 'require_endpoints': None}, 'name': None, 'train_data': {'audio_feature': '', 'audio_feature_shape': (-1,), 'aug_max_area_ratio': 1.0, 'aug_max_aspect_ratio': 2.0, 'aug_min_area_ratio': 0.49, 'aug_min_aspect_ratio': 0.5, 'aug_type': None, 'block_length': 1, 'cache': False, 'compressed_input': False, 'cycle_length': 10, 'data_format': 'channels_last', 'deterministic': None, 'drop_remainder': True, 'dtype': 'float32', 'enable_tf_data_service': False, 'feature_shape': (5, 224, 224, 3), 'file_type': 'tfrecord', 'global_batch_size': 16, 'image_field_key': 'image/encoded', 'input_path': '', 'is_multilabel': False, 'is_ssl': True, 'is_training': True, 'label_field_key': 'clip/label/index', 'min_image_size': 256, 'name': 'ucf101', 'num_classes': 101, 'num_examples': 9537, 'num_test_clips': 1, 'num_test_crops': 1, 'one_hot': True, 'output_audio': False, 'random_stride_range': 0, 'seed': None, 'sharding': True, 'shuffle_buffer_size': 1024, 'split': 'train', 'temporal_stride': 2, 'tf_data_service_address': None, 'tf_data_service_job_name': None, 'tfds_as_supervised': False, 'tfds_data_dir': '', 'tfds_name': 'ucf101', 'tfds_skip_decoding_feature': '', 'tfds_split': 'train', 'variant_name': None}, 'train_input_partition_dims': None, 'validation_data': {'audio_feature': '', 'audio_feature_shape': (-1,), 'aug_max_area_ratio': 1.0, 'aug_max_aspect_ratio': 2.0, 'aug_min_area_ratio': 0.49, 'aug_min_aspect_ratio': 0.5, 'aug_type': None, 'block_length': 1, 'cache': False, 'compressed_input': False, 'cycle_length': 10, 'data_format': 'channels_last', 'deterministic': None, 'drop_remainder': False, 'dtype': 'float32', 'enable_tf_data_service': False, 'feature_shape': (5, 224, 224, 3), 'file_type': 'tfrecord', 'global_batch_size': 16, 'image_field_key': 'image/encoded', 'input_path': '', 'is_multilabel': False, 'is_training': True, 'label_field_key': 'clip/label/index', 'min_image_size': 256, 'name': 'ucf101', 'num_classes': 101, 'num_examples': 3783, 'num_test_clips': 1, 'num_test_crops': 1, 'one_hot': True, 'output_audio': False, 'random_stride_range': 0, 'seed': None, 'sharding': True, 'shuffle_buffer_size': 64, 'split': 'test', 'temporal_stride': 2, 'tf_data_service_address': None, 'tf_data_service_job_name': None, 'tfds_as_supervised': False, 'tfds_data_dir': '', 'tfds_name': 'ucf101', 'tfds_skip_decoding_feature': '', 'tfds_split': 'test', 'variant_name': None}}, 'trainer': {'allow_tpu_summary': False, 'best_checkpoint_eval_metric': '', 'best_checkpoint_export_subdir': '', 'best_checkpoint_metric_comp': 'higher', 'checkpoint_interval': 149, 'continuous_eval_timeout': 3600, 'eval_tf_function': True, 'eval_tf_while_loop': False, 'loss_upper_bound': 1000000.0, 'max_to_keep': 5, 'optimizer_config': {'ema': None, 'learning_rate': {'cosine': {'alpha': 0.0, 'decay_steps': 72888, 'initial_learning_rate': 0.32, 'name': 'CosineDecay', 'offset': 0}, 'type': 'cosine'}, 'optimizer': {'sgd': {'clipnorm': None, 'clipvalue': None, 'decay': 0.0, 'global_clipnorm': None, 'momentum': 0.9, 'name': 'SGD', 'nesterov': False}, 'type': 'sgd'}, 'warmup': {'linear': {'name': 'linear', 'warmup_learning_rate': 0, 'warmup_steps': 1784}, 'type': 'linear'}}, 'recovery_begin_steps': 0, 'recovery_max_trials': 0, 'steps_per_loop': 100, 'summary_interval': 100, 'train_steps': 72888, 'train_tf_function': True, 'train_tf_while_loop': True, 'validation_interval': 149, 'validation_steps': 236, 'validation_summary_subdir': 'validation'}} I0409 23:09:47.331852 140675105997632 train_utils.py:347] Saving experiment configuration to results/ssl/params.yaml 2022-04-09 23:09:47.343096: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-04-09 23:09:47.479676: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:47.480077: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:47.480616: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:47.480939: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:47.481382: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:47.481759: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:48.069361: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:48.069812: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:48.070323: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:48.070895: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:48.071332: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:48.071681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9453 MB memory: -> device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5 2022-04-09 23:09:48.072049: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-04-09 23:09:48.072435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 9640 MB memory: -> device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5 INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1') I0409 23:09:48.224815 140675105997632 mirrored_strategy.py:374] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1') I0409 23:09:48.225352 140675105997632 train_utils.py:220] Running default trainer. I0409 23:09:48.225423 140675105997632 pretrain.py:43] Build model input [5, 224, 224, 3] INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). I0409 23:09:48.243512 140675105997632 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). I0409 23:09:48.244689 140675105997632 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). I0409 23:09:48.246567 140675105997632 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). I0409 23:09:48.247236 140675105997632 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). I0409 23:09:48.248142 140675105997632 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). I0409 23:09:48.251034 140675105997632 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). I0409 23:09:48.366572 140675105997632 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). I0409 23:09:48.367230 140675105997632 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). I0409 23:09:48.368715 140675105997632 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). I0409 23:09:48.369346 140675105997632 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). I0409 23:09:50.230324 140675105997632 dataset_builder.py:876] No config specified, defaulting to first: ucf101/ucf101_1_256 I0409 23:09:50.234646 140675105997632 dataset_info.py:439] Load dataset info from /home/alpargun/tensorflow_datasets/ucf101/ucf101_1_256/2.0.0 I0409 23:09:50.235485 140675105997632 dataset_builder.py:369] Reusing dataset ucf101 (/home/alpargun/tensorflow_datasets/ucf101/ucf101_1_256/2.0.0) I0409 23:09:50.235562 140675105997632 logging_logger.py:44] Constructing tf.data.Dataset ucf101 for split train, from /home/alpargun/tensorflow_datasets/ucf101/ucf101_1_256/2.0.0 Traceback (most recent call last): File "train.py", line 78, in app.run(main) File "/home/alpargun/.local/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/home/alpargun/.local/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "train.py", line 67, in main train_lib.run_experiment( File "/home/alpargun/Desktop/models-2.8.0/official/core/train_lib.py", line 72, in run_experiment trainer = train_utils.create_trainer( File "/home/alpargun/.local/lib/python3.8/site-packages/gin/config.py", line 1605, in gin_wrapper utils.augment_exception_message_and_reraise(e, err_str) File "/home/alpargun/.local/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise raise proxy.with_traceback(exception.traceback) from None File "/home/alpargun/.local/lib/python3.8/site-packages/gin/config.py", line 1582, in gin_wrapper return fn(*new_args, new_kwargs) File "/home/alpargun/Desktop/models-2.8.0/official/core/train_utils.py", line 224, in create_trainer return trainer_cls( File "/home/alpargun/.local/lib/python3.8/site-packages/gin/config.py", line 1605, in gin_wrapper utils.augment_exception_message_and_reraise(e, err_str) File "/home/alpargun/.local/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise raise proxy.with_traceback(exception.traceback) from None File "/home/alpargun/.local/lib/python3.8/site-packages/gin/config.py", line 1582, in gin_wrapper return fn(*new_args, *new_kwargs) File "/home/alpargun/Desktop/models-2.8.0/official/core/base_trainer.py", line 256, in init train_dataset = train_dataset or self.distribute_dataset( File "/home/alpargun/Desktop/models-2.8.0/official/core/base_trainer.py", line 158, in distribute_dataset return orbit.utils.make_distributed_dataset(self._strategy, dataset_or_fn, File "/home/alpargun/Desktop/models-2.8.0/orbit/utils/common.py", line 90, in make_distributed_dataset return strategy.distribute_datasets_from_function(dataset_fn, input_options) File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1186, in distribute_datasets_from_function return self._extended._distribute_datasets_from_function( # pylint: disable=protected-access File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 593, in _distribute_datasets_from_function return input_util.get_distributed_datasets_from_function( File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/distribute/input_util.py", line 132, in get_distributed_datasets_from_function return input_lib.DistributedDatasetsFromFunction( File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/distribute/input_lib.py", line 1344, in init self.build() File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/distribute/input_lib.py", line 1364, in build _create_datasets_from_function_with_input_context( File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/distribute/input_lib.py", line 1837, in _create_datasets_from_function_with_input_context dataset = dataset_fn(ctx) File "/home/alpargun/Desktop/models-2.8.0/orbit/utils/common.py", line 88, in dataset_fn return dataset_or_fn(args, kwargs) File "/home/alpargun/Desktop/models-2.8.0/official/vision/beta/projects/video_ssl/tasks/pretrain.py", line 69, in build_inputs dataset = reader.read(input_context=input_context) File "/home/alpargun/Desktop/models-2.8.0/official/core/input_reader.py", line 469, in read dataset = self._decode_and_parse_dataset(dataset, self._global_batch_size, File "/home/alpargun/Desktop/models-2.8.0/official/core/input_reader.py", line 400, in _decode_and_parse_dataset dataset = tf.nest.map_structure(_shuffle_and_decode, dataset) File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/util/nest.py", line 914, in map_structure structure[0], [func(x) for x in entries], File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/util/nest.py", line 914, in structure[0], [func(x) for x in entries], File "/home/alpargun/Desktop/models-2.8.0/official/core/input_reader.py", line 397, in _shuffle_and_decode ds = _maybe_map_fn(ds, self._decoder_fn) File "/home/alpargun/Desktop/models-2.8.0/official/core/input_reader.py", line 33, in _maybe_map_fn return dataset if fn is None else dataset.map( File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 2018, in map return ParallelMapDataset( File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 5234, in init self._map_func = structured_function.StructuredFunctionWrapper( File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/structured_function.py", line 271, in init self._function = fn_factory() File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3070, in get_concrete_function graph_function = self._get_concrete_function_garbage_collected( File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3036, in _get_concrete_function_garbage_collected graphfunction, = self._maybe_define_function(args, kwargs) File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3292, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs) File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3130, in _create_graph_function func_graph_module.func_graph_from_py_func( File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 1161, in func_graph_from_py_func func_outputs = python_func(*func_args, *func_kwargs) File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/structured_function.py", line 248, in wrapped_fn ret = wrapper_helper(args) File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/structured_function.py", line 177, in wrapper_helper ret = autograph.tf_convert(self._func, ag_ctx)(*nested_args) File "/home/alpargun/.local/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 692, in wrapper raise e.ag_error_metadata.to_exception(e) TypeError: in user code:

File "/home/alpargun/Desktop/models-2.8.0/official/vision/beta/dataloaders/video_input.py", line 227, in decode * context, sequences = tf.io.parse_single_sequence_example(

TypeError: Expected any non-tensor type, but got a tensor instead. In call to configurable 'Trainer' (<class 'official.core.base_trainer.Trainer'>) In call to configurable 'create_trainer' (<function createtrainer at 0x7ff1074d2550>)

6. System information

yeqingli commented 2 years ago

The problem for the Kinetics-400 pretrain is that you need to download the data yourself and format them into the tfrecrod files. You can refer to here for the required format of the tfrecord

The issue for the UCF-101 is just the video_ssl code was built on top of the older version of video classification task. Therefore, the SSL input pipeline does not recognize TFDS. The new video classification input reader does. So a quick fix might be to copy the build_inputs function from the new video classification task and replace the parser function accordingly.

alpargun commented 2 years ago

The problem for the Kinetics-400 pretrain is that you need to download the data yourself and format them into the tfrecrod files. You can refer to here for the required format of the tfrecord

The issue for the UCF-101 is just the video_ssl code was built on top of the older version of video classification task. Therefore, the SSL input pipeline does not recognize TFDS. The new video classification input reader does. So a quick fix might be to copy the build_inputs function from the new video classification task and replace the parser function accordingly.

I tried as @yeqingli suggested, so that the build_inputs function in the video_ssl project looks like this:

  """New build_inputs function"""
  def build_inputs(self,
                   params: exp_cfg.DataConfig,
                   input_context: Optional[tf.distribute.InputContext] = None):
    """Builds classification input."""

    parser = video_input.Parser(
        input_params=params,
        image_key=params.image_field_key,
        label_key=params.label_field_key)
    postprocess_fn = video_input.PostBatchProcessor(params)

    reader = input_reader_factory.input_reader_generator(
        params,
        dataset_fn=self._get_dataset_fn(params),
        decoder_fn=self._get_decoder_fn(params),
        parser_fn=parser.parse_fn(params.is_training),
        postprocess_fn=postprocess_fn)

    dataset = reader.read(input_context=input_context)

    return dataset

However, I still receive the error:

File "/home/alp/Desktop/alp-argun/official/vision/dataloaders/video_input.py", line 230, in decode  *
    context, sequences = tf.io.parse_single_sequence_example(

TypeError: Expected any non-tensor type, but got a tensor instead.

In call to configurable 'Trainer' (<class 'official.core.base_trainer.Trainer'>) In call to configurable 'create_trainer' (<function create_trainer at 0x7f0761823af0>)

laxmareddyp commented 1 year ago

Hi @alpargun,

Could you please check the latest notebook has been published to github and it will have instructions on how to run on custom data, how to convert data as TFRecords,loading the experiment configuration, overriding the configuration parameters ,fine-tune & run inference with the model and saving and exporting the trained model.

Please let me know if it solves your problem.

Thanks.

google-ml-butler[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

laxmareddyp commented 1 year ago

Closing as stale. Please reopen if you'd like to work on this further. Thanks

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No