nixingyang / AdaptiveL2Regularization

[ICPR 2020] Adaptive L2 Regularization in Person Re-Identification
https://ieeexplore.ieee.org/document/9412481
MIT License
64 stars 23 forks source link

AttributeError: Can't pickle local object 'init_resnet.<locals>.<lambda>' #7

Closed kazuki-can closed 3 years ago

kazuki-can commented 3 years ago

Hi , I'm new to this model. Since this is super interesting, I want to see how it works. But, I am facing an error saying "AttributeError: Can't pickle local object 'init_resnet..'". How can I sort it out?

nixingyang commented 3 years ago

Hi, Thanks for your interest in our work. I might have encountered a similar issue before, but could not remember the details :-( However, all should work just fine as long as you follow the guide in the Environment section. Are you using python 3.7 with tensorflow 2.2.1 on a Linux machine? All the best. Xingyang Ni

kazuki-can commented 3 years ago

Thanks for quick reply. I followed the installation section except for cudnn since my computer does not have it. And I am using python 3.7 with tensorflow 2.2.1 on windows machine.

nixingyang commented 3 years ago

Could you provide the error log? This issue is probably related to Windows. Xingyang

kazuki-can commented 3 years ago

Traceback (most recent call last): File "", line 1, in File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\multiprocessing\spawn.py", line 115, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input Exception in thread Thread-29: Traceback (most recent call last): File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\threading.py", line 926, in _bootstrap_inner self.run() File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\site-packages\tensorflow\python\keras\utils\data_utils.py", line 843, in _run with closing(self.executor_fn(_SHARED_SEQUENCES)) as executor: File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\site-packages\tensorflow\python\keras\utils\data_utils.py", line 822, in pool_fn initargs=(seqs, None, get_worker_id_queue())) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\multiprocessing\context.py", line 119, in Pool context=self.get_context()) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\multiprocessing\pool.py", line 176, in init self._repopulate_pool() File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\multiprocessing\pool.py", line 241, in _repopulate_pool w.start() File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\multiprocessing\popen_spawn_win32.py", line 89, in init reduction.dump(process_obj, to_child) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'init_resnet..'

kazuki-can commented 3 years ago

It runs till "Freeze layers in the backbone model for 20 epochs. ~~ I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll" But after this, it says like above.

kazuki-can commented 3 years ago

And when I run the commmand with python3 , it just says Python. So I run it with python -u solution.py --dataset_name "Market1501" --backbone_model_name "ResNet50"

nixingyang commented 3 years ago

This issue is actually the same as https://github.com/nixingyang/AdaptiveL2Regularization/issues/4. Try appending --workers 1 to the command. If it still does not work, use a Linux machine instead.

kazuki-can commented 3 years ago

Thank you so much. Its running correctly now. I hope it will finish training.

kazuki-can commented 3 years ago

Fortunately, it finished first 20 epochs. But after that, it ran again for some reasons. What will I see when it's been trained? In output_2020_12_22\Market1501_384x128\ResNet50_16_4, I can see training_A and I can see pkl file and many png files in it.

nixingyang commented 3 years ago

This is expected. Wait until the process completes.

kazuki-can commented 3 years ago

2020-12-22 22:28:19.472202: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at strided_slice_op.cc:138 : Resource exhausted: OOM when allocating tensor with shape[64,12,8,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc Traceback (most recent call last): File "solution.py", line 1070, in app.run(main) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\site-packages\absl\app.py", line 303, in run _run_main(main, args) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\site-packages\absl\app.py", line 251, in _run_main sys.exit(main(argv)) File "solution.py", line 1060, in main verbose=2) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\site-packages\tensorflow\python\keras\engine\training.py", line 66, in _method_wrapper return method(self, *args, kwargs) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\site-packages\tensorflow\python\keras\engine\training.py", line 848, in fit tmp_logs = train_function(iterator) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\site-packages\tensorflow\python\eager\def_function.py", line 580, in call result = self._call(*args, *kwds) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\site-packages\tensorflow\python\eager\def_function.py", line 644, in _call return self._stateless_fn(args, kwds) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\site-packages\tensorflow\python\eager\function.py", line 2420, in call return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\site-packages\tensorflow\python\eager\function.py", line 1665, in _filtered_call self.captured_inputs) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\site-packages\tensorflow\python\eager\function.py", line 1746, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\site-packages\tensorflow\python\eager\function.py", line 598, in call ctx=ctx) File "C:\Users###\anaconda3\envs\AdaptiveL2\lib\site-packages\tensorflow\python\eager\execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[64,24,8,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node training_model/last_block_for_global_branch/conv5_block3_out/Relu-0-0-TransposeNCHWToNHWC-LayoutOptimizer}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [Op:__inference_train_function_136955]

Function call stack: train_function

2020-12-22 22:28:20.011380: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated. [[{{node PyFunc}}]]

kazuki-can commented 3 years ago

This is the error I am facing after re-running. I'm sorry for making you busy because of me and also thank you so much for helping me a lot.

nixingyang commented 3 years ago

No problem. This is an out of memory issue. You could use smaller images by specifying the image_width and image_height flags, or use a GPU with large memory.

kazuki-can commented 3 years ago

How should I specify the image_width and image_height flags?

nixingyang commented 3 years ago

Use something like --image_width 64 --image_height 192.

kazuki-can commented 3 years ago

Thank you very much. I'm so glad that you such a considerate person have made this model. I downloaded pre-trained model and tried evaluation. It says 'All done' , so I think it finished successfully,but I want to train it myself so I will try it. And thank you for understanding my poor English.

nixingyang commented 3 years ago

You are welcome. Feel free to ask if you have any other questions.

kazuki-can commented 3 years ago

How does this model extract the features of each person ? From body parts or whole appearance or like height?

nixingyang commented 3 years ago

Hi, we are using one global branch and two regional branches. Features from each branch are concatenated in the inference procedure. You may find more details in the "B. Baseline" section.

kazuki-can commented 3 years ago

Thank you so much for explaining.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

Closing as stale. Please reopen if you'd like to work on this further.