Performance issues in examples/

DLPerf commented 2 years ago

Hello! I've found a performance issue in tensorlayer/examples: batch() should be called before map(), which could make your program more efficient. Here is the tensorflow document to support it.

Detailed description is listed below:

examples/quantized_net/tutorial_binarynet_cifar10_tfrecord.py: train_ds = train_ds.batch(batch_size)(here) should be called before train_ds = train_ds.map(_map_fn_train, num_parallel_calls=multiprocessing.cpu_count())(here).
examples/quantized_net/tutorial_binarynet_cifar10_tfrecord.py: test_ds = test_ds.batch(batch_size)(here) shoule be called before test_ds = test_ds.map(_map_fn_test, num_parallel_calls=multiprocessing.cpu_count())(here).
examples/quantized_net/tutorial_dorefanet_cifar10_tfrecord.py: train_ds = train_ds.batch(batch_size)(here) should be called before train_ds = train_ds.map(_map_fn_train, num_parallel_calls=multiprocessing.cpu_count())(here).
examples/quantized_net/tutorial_dorefanet_cifar10_tfrecord.py: test_ds = test_ds.batch(batch_size)(here) should be called before test_ds = test_ds.map(_map_fn_test, num_parallel_calls=multiprocessing.cpu_count())(here).
examples/quantized_net/tutorial_quanconv_cifar10.py: train_ds = train_ds.batch(batch_size)(here) should be called before train_ds = train_ds.map(_map_fn_train, num_parallel_calls=multiprocessing.cpu_count())(here).
examples/quantized_net/tutorial_quanconv_cifar10.py: test_ds = test_ds.batch(batch_size)(here) should be called before test_ds = test_ds.map(_map_fn_test, num_parallel_calls=multiprocessing.cpu_count())(here).
examples/quantized_net/tutorial_ternaryweight_cifar10_tfrecord.py: train_ds = train_ds.batch(batch_size)(here) should be called before train_ds = train_ds.map(_map_fn_train, num_parallel_calls=multiprocessing.cpu_count())(here).
examples/quantized_net/tutorial_ternaryweight_cifar10_tfrecord.py: test_ds = test_ds.batch(batch_size)(here) should be called before test_ds = test_ds.map(_map_fn_test, num_parallel_calls=multiprocessing.cpu_count())(here).
examples/data_process/tutorial_fast_affine_transform.py: dataset = dataset.batch(batch_size)(here) should be called before dataset = dataset.map(_map_fn, num_parallel_calls=multiprocessing.cpu_count())(here).
examples/data_process/tutorial_tf_dataset_voc.py: ds = ds.batch(batch_size)(here) should be called before ds = ds.map(_map_fn, num_parallel_calls=multiprocessing.cpu_count())(here).
examples/basic_tutorials/tutorial_cifar10_cnn_static.py: train_ds = train_ds.batch(batch_size)(here) should be called before train_ds = train_ds.map(_map_fn_train, num_parallel_calls=multiprocessing.cpu_count())(here).
examples/basic_tutorials/tutorial_cifar10_cnn_static.py: test_ds = test_ds.batch(batch_size)(here) should be called before test_ds = test_ds.map(_map_fn_test, num_parallel_calls=multiprocessing.cpu_count())(here).
examples/deprecated_tutorials/tutorial_imagenet_inceptionV3_distributed.py: dataset = dataset.batch(batch_size)(here) should be called before dataset = dataset.map(_map_fn, num_parallel_calls=max_cpus)(here).

Besides, you need to check the function called in map()(e.g., _map_fn called in dataset.map()) whether to be affected or not to make the changed code work properly. For example, if _map_fn needs data with shape (x, y, z) as its input before fix, it would require data with shape (batch_size, x, y, z).

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

zsdonghao commented 2 years ago

thanks, we will have a check asap

DLPerf commented 2 years ago

Hello, How long do you need to confirm this problem? @zsdonghao Thank you~

hanjr92 commented 2 years ago

Sorry! It is too late to reply you. I will modify them and update. @DLPerf

tensorlayer / TensorLayer

Performance issues in examples/ #1139