researchmm / SiamDW

[CVPR'19 Oral] Deeper and Wider Siamese Networks for Real-Time Visual Tracking
http://openaccess.thecvf.com/content_CVPR_2019/html/Zhang_Deeper_and_Wider_Siamese_Networks_for_Real-Time_Visual_Tracking_CVPR_2019_paper.html
MIT License
751 stars 180 forks source link

Population and group size of GA. #27

Closed iminfine closed 5 years ago

iminfine commented 5 years ago

I run the tuning process of SiamFC with OTB-2015 on the cloud by 8 GPUs, but it takes a long time to get the results.

The population and group size of GA in tune_gune are both set to 100, the population size seems quite bigger, so I am thinking to reduce it, is that possible? Do you test the tuning process of SiamFC using GA with small population size?

JudasDie commented 5 years ago

I run the tuning process of SiamFC with OTB-2015 on the cloud by 8 GPUs, but it takes a long time to get the results.

The population and group size of GA in tune_gune are both set to 100, the population size seems quite bigger, so I am thinking to reduce it, is that possible? Do you test the tuning process of SiamFC using GA with small population size?

  1. Yes, it takes a long time to tune hyper-parameters.
  2. The total hyperparametric numbers needs to reach a certain scale even with a small population size. It takes about one day with 8 GPUs.
  3. Try TPE for better results.
iminfine commented 5 years ago

I run the tuning process of SiamFC with OTB-2015 on the cloud by 8 GPUs, but it takes a long time to get the results. The population and group size of GA in tune_gune are both set to 100, the population size seems quite bigger, so I am thinking to reduce it, is that possible? Do you test the tuning process of SiamFC using GA with small population size?

  1. Yes, it takes a long time to tune hyper-parameters.
  2. The total hyperparametric numbers needs to reach a certain scale even with a small population size. It takes about one day with 8 GPUs.
  3. Try TPE for better results.

Thanks, the TPE seems not working after you fixing TAB-SPACE bugs of test_siamfc.py, and comment in code said the GA is faster than TPE(not faster enough, I prefer to use TPE ). Thus I run GA instead. I got error messages like this: 2019-07-15 05:55:28,916 WARNING experiment.py:30 --trial_resourcesis deprecated. Please useresources_per_trial.trial_resources` will be removed in future versions of Ray. 2019-07-15 05:55:28,916 INFO tune.py:139 -- Did not find checkpoint file in ./TPE_results/zp_tune. 2019-07-15 05:55:28,916 INFO tune.py:145 -- Starting a new experiment. == Status == Using AsyncHyperBand: num_stopped=0 Bracket: Iter 180.000: None | Iter 60.000: None | Iter 20.000: None Bracket: Iter 180.000: None | Iter 60.000: None Bracket: Iter 180.000: None Resources requested: 0/4 CPUs, 0/1 GPUs Memory usage on this node: 6.7/16.7 GB

2019-07-15 05:55:29,000 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. == Status == Using AsyncHyperBand: num_stopped=0 Bracket: Iter 180.000: None | Iter 60.000: None | Iter 20.000: None Bracket: Iter 180.000: None | Iter 60.000: None Bracket: Iter 180.000: None Resources requested: 1/4 CPUs, 0.5/1 GPUs Memory usage on this node: 6.8/16.7 GB Result logdir: ./TPE_results/zp_tune PENDING trials:

2019-07-15 05:55:29,109 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. 2019-07-15 05:55:29,346 WARNING logger.py:27 -- Couldn't import TensorFlow - disabling TensorBoard logging. 2019-07-15 05:55:29,396 WARNING logger.py:27 -- Couldn't import TensorFlow - disabling TensorBoard logging. 2019-07-15 05:55:29,997 ERROR worker.py:1632 -- Failed to unpickle actor class 'WrappedFunc' for actor ID 4081809b3f48f44d15f26536b32befdb77815c5f. Traceback: Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/function_manager.py", line 632, in fetch_and_register_actor unpickled_class = pickle.loads(pickled_class) ModuleNotFoundError: No module named 'tracker'

2019-07-15 05:55:29,998 ERROR worker.py:1632 -- Failed to unpickle actor class 'WrappedFunc' for actor ID 7e05f09aa98b7d34320bcf1366eed7cd1bd0e8f1. Traceback: Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/function_manager.py", line 632, in fetch_and_register_actor unpickled_class = pickle.loads(pickled_class) ModuleNotFoundError: No module named 'tracker'

2019-07-15 05:55:29,999 ERROR trial_runner.py:413 -- Error processing event. Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 378, in _process_events result = self.trial_executor.fetch_result(trial) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 228, in fetch_result result = ray.get(trial_future[0]) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/worker.py", line 2132, in get raise value ray.worker.RayTaskError: ray_worker (pid=28562, host=bo-Surface-Book-2) Exception: The actor with name WrappedFunc failed to be imported, and so cannot execute this method

2019-07-15 05:55:30,037 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. 2019-07-15 05:55:30,129 ERROR trial_runner.py:413 -- Error processing event. Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 378, in _process_events result = self.trial_executor.fetch_result(trial) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 228, in fetch_result result = ray.get(trial_future[0]) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/worker.py", line 2132, in get raise value ray.worker.RayTaskError: ray_worker (pid=28563, host=bo-Surface-Book-2) Exception: The actor with name WrappedFunc failed to be imported, and so cannot execute this method

2019-07-15 05:55:30,173 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. ` Any suggestions of this?

JudasDie commented 5 years ago

I run the tuning process of SiamFC with OTB-2015 on the cloud by 8 GPUs, but it takes a long time to get the results. The population and group size of GA in tune_gune are both set to 100, the population size seems quite bigger, so I am thinking to reduce it, is that possible? Do you test the tuning process of SiamFC using GA with small population size?

  1. Yes, it takes a long time to tune hyper-parameters.
  2. The total hyperparametric numbers needs to reach a certain scale even with a small population size. It takes about one day with 8 GPUs.
  3. Try TPE for better results.

Thanks, the TPE seems not working after you fixing TAB-SPACE bugs of test_siamfc.py, and comment in code said the GA is faster than TPE(not faster enough, I prefer to use TPE ). Thus I run GA instead. I got error messages like this: 2019-07-15 05:55:28,916 WARNING experiment.py:30 --trial_resourcesis deprecated. Please useresources_per_trial.trial_resources` will be removed in future versions of Ray. 2019-07-15 05:55:28,916 INFO tune.py:139 -- Did not find checkpoint file in ./TPE_results/zp_tune. 2019-07-15 05:55:28,916 INFO tune.py:145 -- Starting a new experiment. == Status == Using AsyncHyperBand: num_stopped=0 Bracket: Iter 180.000: None | Iter 60.000: None | Iter 20.000: None Bracket: Iter 180.000: None | Iter 60.000: None Bracket: Iter 180.000: None Resources requested: 0/4 CPUs, 0/1 GPUs Memory usage on this node: 6.7/16.7 GB

2019-07-15 05:55:29,000 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. == Status == Using AsyncHyperBand: num_stopped=0 Bracket: Iter 180.000: None | Iter 60.000: None | Iter 20.000: None Bracket: Iter 180.000: None | Iter 60.000: None Bracket: Iter 180.000: None Resources requested: 1/4 CPUs, 0.5/1 GPUs Memory usage on this node: 6.8/16.7 GB Result logdir: ./TPE_results/zp_tune PENDING trials:

* fitness_2_scale_lr=0.5674,scale_penalty=0.9596,scale_step=1.0614,w_influence=0.3149:    PENDING

* fitness_3_scale_lr=0.5008,scale_penalty=0.9529,scale_step=1.1539,w_influence=0.2413:    PENDING

* fitness_4_scale_lr=0.6948,scale_penalty=0.9551,scale_step=1.011,w_influence=0.4641: PENDING

* fitness_5_scale_lr=0.6827,scale_penalty=0.9836,scale_step=1.0539,w_influence=0.6642:    PENDING
  RUNNING trials:

* fitness_1_scale_lr=0.2358,scale_penalty=0.9937,scale_step=1.1995,w_influence=0.5447:    RUNNING

2019-07-15 05:55:29,109 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. 2019-07-15 05:55:29,346 WARNING logger.py:27 -- Couldn't import TensorFlow - disabling TensorBoard logging. 2019-07-15 05:55:29,396 WARNING logger.py:27 -- Couldn't import TensorFlow - disabling TensorBoard logging. 2019-07-15 05:55:29,997 ERROR worker.py:1632 -- Failed to unpickle actor class 'WrappedFunc' for actor ID 4081809b3f48f44d15f26536b32befdb77815c5f. Traceback: Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/function_manager.py", line 632, in fetch_and_register_actor unpickled_class = pickle.loads(pickled_class) ModuleNotFoundError: No module named 'tracker'

2019-07-15 05:55:29,998 ERROR worker.py:1632 -- Failed to unpickle actor class 'WrappedFunc' for actor ID 7e05f09aa98b7d34320bcf1366eed7cd1bd0e8f1. Traceback: Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/function_manager.py", line 632, in fetch_and_register_actor unpickled_class = pickle.loads(pickled_class) ModuleNotFoundError: No module named 'tracker'

2019-07-15 05:55:29,999 ERROR trial_runner.py:413 -- Error processing event. Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 378, in _process_events result = self.trial_executor.fetch_result(trial) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 228, in fetch_result result = ray.get(trial_future[0]) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/worker.py", line 2132, in get raise value ray.worker.RayTaskError: ray_worker (pid=28562, host=bo-Surface-Book-2) Exception: The actor with name WrappedFunc failed to be imported, and so cannot execute this method

2019-07-15 05:55:30,037 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. 2019-07-15 05:55:30,129 ERROR trial_runner.py:413 -- Error processing event. Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 378, in _process_events result = self.trial_executor.fetch_result(trial) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 228, in fetch_result result = ray.get(trial_future[0]) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/worker.py", line 2132, in get raise value ray.worker.RayTaskError: ray_worker (pid=28563, host=bo-Surface-Book-2) Exception: The actor with name WrappedFunc failed to be imported, and so cannot execute this method

2019-07-15 05:55:30,173 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. ` Any suggestions of this?

Rerun the code for several times. Sometimes it reports No module named tracker or No module named model for unknowen reason.

iminfine commented 5 years ago

I run the tuning process of SiamFC with OTB-2015 on the cloud by 8 GPUs, but it takes a long time to get the results. The population and group size of GA in tune_gune are both set to 100, the population size seems quite bigger, so I am thinking to reduce it, is that possible? Do you test the tuning process of SiamFC using GA with small population size?

  1. Yes, it takes a long time to tune hyper-parameters.
  2. The total hyperparametric numbers needs to reach a certain scale even with a small population size. It takes about one day with 8 GPUs.
  3. Try TPE for better results.

Thanks, the TPE seems not working after you fixing TAB-SPACE bugs of test_siamfc.py, and comment in code said the GA is faster than TPE(not faster enough, I prefer to use TPE ). Thus I run GA instead. I got error messages like this: 2019-07-15 05:55:28,916 WARNING experiment.py:30 --trial_resourcesis deprecated. Please useresources_per_trial.trial_resources` will be removed in future versions of Ray. 2019-07-15 05:55:28,916 INFO tune.py:139 -- Did not find checkpoint file in ./TPE_results/zp_tune. 2019-07-15 05:55:28,916 INFO tune.py:145 -- Starting a new experiment. == Status == Using AsyncHyperBand: num_stopped=0 Bracket: Iter 180.000: None | Iter 60.000: None | Iter 20.000: None Bracket: Iter 180.000: None | Iter 60.000: None Bracket: Iter 180.000: None Resources requested: 0/4 CPUs, 0/1 GPUs Memory usage on this node: 6.7/16.7 GB 2019-07-15 05:55:29,000 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. == Status == Using AsyncHyperBand: num_stopped=0 Bracket: Iter 180.000: None | Iter 60.000: None | Iter 20.000: None Bracket: Iter 180.000: None | Iter 60.000: None Bracket: Iter 180.000: None Resources requested: 1/4 CPUs, 0.5/1 GPUs Memory usage on this node: 6.8/16.7 GB Result logdir: ./TPE_results/zp_tune PENDING trials:

* fitness_2_scale_lr=0.5674,scale_penalty=0.9596,scale_step=1.0614,w_influence=0.3149:  PENDING

* fitness_3_scale_lr=0.5008,scale_penalty=0.9529,scale_step=1.1539,w_influence=0.2413:  PENDING

* fitness_4_scale_lr=0.6948,scale_penalty=0.9551,scale_step=1.011,w_influence=0.4641:   PENDING

* fitness_5_scale_lr=0.6827,scale_penalty=0.9836,scale_step=1.0539,w_influence=0.6642:  PENDING
  RUNNING trials:

* fitness_1_scale_lr=0.2358,scale_penalty=0.9937,scale_step=1.1995,w_influence=0.5447:  RUNNING

2019-07-15 05:55:29,109 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. 2019-07-15 05:55:29,346 WARNING logger.py:27 -- Couldn't import TensorFlow - disabling TensorBoard logging. 2019-07-15 05:55:29,396 WARNING logger.py:27 -- Couldn't import TensorFlow - disabling TensorBoard logging. 2019-07-15 05:55:29,997 ERROR worker.py:1632 -- Failed to unpickle actor class 'WrappedFunc' for actor ID 4081809b3f48f44d15f26536b32befdb77815c5f. Traceback: Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/function_manager.py", line 632, in fetch_and_register_actor unpickled_class = pickle.loads(pickled_class) ModuleNotFoundError: No module named 'tracker' 2019-07-15 05:55:29,998 ERROR worker.py:1632 -- Failed to unpickle actor class 'WrappedFunc' for actor ID 7e05f09aa98b7d34320bcf1366eed7cd1bd0e8f1. Traceback: Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/function_manager.py", line 632, in fetch_and_register_actor unpickled_class = pickle.loads(pickled_class) ModuleNotFoundError: No module named 'tracker' 2019-07-15 05:55:29,999 ERROR trial_runner.py:413 -- Error processing event. Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 378, in _process_events result = self.trial_executor.fetch_result(trial) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 228, in fetch_result result = ray.get(trial_future[0]) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/worker.py", line 2132, in get raise value ray.worker.RayTaskError: ray_worker (pid=28562, host=bo-Surface-Book-2) Exception: The actor with name WrappedFunc failed to be imported, and so cannot execute this method 2019-07-15 05:55:30,037 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. 2019-07-15 05:55:30,129 ERROR trial_runner.py:413 -- Error processing event. Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 378, in _process_events result = self.trial_executor.fetch_result(trial) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 228, in fetch_result result = ray.get(trial_future[0]) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/worker.py", line 2132, in get raise value ray.worker.RayTaskError: ray_worker (pid=28563, host=bo-Surface-Book-2) Exception: The actor with name WrappedFunc failed to be imported, and so cannot execute this method 2019-07-15 05:55:30,173 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. ` Any suggestions of this?

Rerun the code for several times. Sometimes it reports No module named tracker or No module named model for unknowen reason.

Thanks.