Closed iminfine closed 5 years ago
I run the tuning process of SiamFC with OTB-2015 on the cloud by 8 GPUs, but it takes a long time to get the results.
The population and group size of GA in tune_gune are both set to 100, the population size seems quite bigger, so I am thinking to reduce it, is that possible? Do you test the tuning process of SiamFC using GA with small population size?
I run the tuning process of SiamFC with OTB-2015 on the cloud by 8 GPUs, but it takes a long time to get the results. The population and group size of GA in tune_gune are both set to 100, the population size seems quite bigger, so I am thinking to reduce it, is that possible? Do you test the tuning process of SiamFC using GA with small population size?
- Yes, it takes a long time to tune hyper-parameters.
- The total hyperparametric numbers needs to reach a certain scale even with a small population size. It takes about one day with 8 GPUs.
- Try TPE for better results.
Thanks, the TPE seems not working after you fixing TAB-SPACE bugs of test_siamfc.py, and comment in code said the GA is faster than TPE(not faster enough, I prefer to use TPE ). Thus I run GA instead. I got error messages like this:
2019-07-15 05:55:28,916 WARNING experiment.py:30 --
trial_resourcesis deprecated. Please use
resources_per_trial.
trial_resources` will be removed in future versions of Ray.
2019-07-15 05:55:28,916 INFO tune.py:139 -- Did not find checkpoint file in ./TPE_results/zp_tune.
2019-07-15 05:55:28,916 INFO tune.py:145 -- Starting a new experiment.
== Status ==
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 180.000: None | Iter 60.000: None | Iter 20.000: None
Bracket: Iter 180.000: None | Iter 60.000: None
Bracket: Iter 180.000: None
Resources requested: 0/4 CPUs, 0/1 GPUs
Memory usage on this node: 6.7/16.7 GB
2019-07-15 05:55:29,000 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. == Status == Using AsyncHyperBand: num_stopped=0 Bracket: Iter 180.000: None | Iter 60.000: None | Iter 20.000: None Bracket: Iter 180.000: None | Iter 60.000: None Bracket: Iter 180.000: None Resources requested: 1/4 CPUs, 0.5/1 GPUs Memory usage on this node: 6.8/16.7 GB Result logdir: ./TPE_results/zp_tune PENDING trials:
2019-07-15 05:55:29,109 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. 2019-07-15 05:55:29,346 WARNING logger.py:27 -- Couldn't import TensorFlow - disabling TensorBoard logging. 2019-07-15 05:55:29,396 WARNING logger.py:27 -- Couldn't import TensorFlow - disabling TensorBoard logging. 2019-07-15 05:55:29,997 ERROR worker.py:1632 -- Failed to unpickle actor class 'WrappedFunc' for actor ID 4081809b3f48f44d15f26536b32befdb77815c5f. Traceback: Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/function_manager.py", line 632, in fetch_and_register_actor unpickled_class = pickle.loads(pickled_class) ModuleNotFoundError: No module named 'tracker'
2019-07-15 05:55:29,998 ERROR worker.py:1632 -- Failed to unpickle actor class 'WrappedFunc' for actor ID 7e05f09aa98b7d34320bcf1366eed7cd1bd0e8f1. Traceback: Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/function_manager.py", line 632, in fetch_and_register_actor unpickled_class = pickle.loads(pickled_class) ModuleNotFoundError: No module named 'tracker'
2019-07-15 05:55:29,999 ERROR trial_runner.py:413 -- Error processing event. Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 378, in _process_events result = self.trial_executor.fetch_result(trial) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 228, in fetch_result result = ray.get(trial_future[0]) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/worker.py", line 2132, in get raise value ray.worker.RayTaskError: ray_worker (pid=28562, host=bo-Surface-Book-2) Exception: The actor with name WrappedFunc failed to be imported, and so cannot execute this method
2019-07-15 05:55:30,037 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. 2019-07-15 05:55:30,129 ERROR trial_runner.py:413 -- Error processing event. Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 378, in _process_events result = self.trial_executor.fetch_result(trial) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 228, in fetch_result result = ray.get(trial_future[0]) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/worker.py", line 2132, in get raise value ray.worker.RayTaskError: ray_worker (pid=28563, host=bo-Surface-Book-2) Exception: The actor with name WrappedFunc failed to be imported, and so cannot execute this method
2019-07-15 05:55:30,173 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. ` Any suggestions of this?
I run the tuning process of SiamFC with OTB-2015 on the cloud by 8 GPUs, but it takes a long time to get the results. The population and group size of GA in tune_gune are both set to 100, the population size seems quite bigger, so I am thinking to reduce it, is that possible? Do you test the tuning process of SiamFC using GA with small population size?
- Yes, it takes a long time to tune hyper-parameters.
- The total hyperparametric numbers needs to reach a certain scale even with a small population size. It takes about one day with 8 GPUs.
- Try TPE for better results.
Thanks, the TPE seems not working after you fixing TAB-SPACE bugs of test_siamfc.py, and comment in code said the GA is faster than TPE(not faster enough, I prefer to use TPE ). Thus I run GA instead. I got error messages like this:
2019-07-15 05:55:28,916 WARNING experiment.py:30 --
trial_resourcesis deprecated. Please use
resources_per_trial.
trial_resources` will be removed in future versions of Ray. 2019-07-15 05:55:28,916 INFO tune.py:139 -- Did not find checkpoint file in ./TPE_results/zp_tune. 2019-07-15 05:55:28,916 INFO tune.py:145 -- Starting a new experiment. == Status == Using AsyncHyperBand: num_stopped=0 Bracket: Iter 180.000: None | Iter 60.000: None | Iter 20.000: None Bracket: Iter 180.000: None | Iter 60.000: None Bracket: Iter 180.000: None Resources requested: 0/4 CPUs, 0/1 GPUs Memory usage on this node: 6.7/16.7 GB2019-07-15 05:55:29,000 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. == Status == Using AsyncHyperBand: num_stopped=0 Bracket: Iter 180.000: None | Iter 60.000: None | Iter 20.000: None Bracket: Iter 180.000: None | Iter 60.000: None Bracket: Iter 180.000: None Resources requested: 1/4 CPUs, 0.5/1 GPUs Memory usage on this node: 6.8/16.7 GB Result logdir: ./TPE_results/zp_tune PENDING trials:
* fitness_2_scale_lr=0.5674,scale_penalty=0.9596,scale_step=1.0614,w_influence=0.3149: PENDING * fitness_3_scale_lr=0.5008,scale_penalty=0.9529,scale_step=1.1539,w_influence=0.2413: PENDING * fitness_4_scale_lr=0.6948,scale_penalty=0.9551,scale_step=1.011,w_influence=0.4641: PENDING * fitness_5_scale_lr=0.6827,scale_penalty=0.9836,scale_step=1.0539,w_influence=0.6642: PENDING RUNNING trials: * fitness_1_scale_lr=0.2358,scale_penalty=0.9937,scale_step=1.1995,w_influence=0.5447: RUNNING
2019-07-15 05:55:29,109 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. 2019-07-15 05:55:29,346 WARNING logger.py:27 -- Couldn't import TensorFlow - disabling TensorBoard logging. 2019-07-15 05:55:29,396 WARNING logger.py:27 -- Couldn't import TensorFlow - disabling TensorBoard logging. 2019-07-15 05:55:29,997 ERROR worker.py:1632 -- Failed to unpickle actor class 'WrappedFunc' for actor ID 4081809b3f48f44d15f26536b32befdb77815c5f. Traceback: Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/function_manager.py", line 632, in fetch_and_register_actor unpickled_class = pickle.loads(pickled_class) ModuleNotFoundError: No module named 'tracker'
2019-07-15 05:55:29,998 ERROR worker.py:1632 -- Failed to unpickle actor class 'WrappedFunc' for actor ID 7e05f09aa98b7d34320bcf1366eed7cd1bd0e8f1. Traceback: Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/function_manager.py", line 632, in fetch_and_register_actor unpickled_class = pickle.loads(pickled_class) ModuleNotFoundError: No module named 'tracker'
2019-07-15 05:55:29,999 ERROR trial_runner.py:413 -- Error processing event. Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 378, in _process_events result = self.trial_executor.fetch_result(trial) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 228, in fetch_result result = ray.get(trial_future[0]) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/worker.py", line 2132, in get raise value ray.worker.RayTaskError: ray_worker (pid=28562, host=bo-Surface-Book-2) Exception: The actor with name WrappedFunc failed to be imported, and so cannot execute this method
2019-07-15 05:55:30,037 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. 2019-07-15 05:55:30,129 ERROR trial_runner.py:413 -- Error processing event. Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 378, in _process_events result = self.trial_executor.fetch_result(trial) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 228, in fetch_result result = ray.get(trial_future[0]) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/worker.py", line 2132, in get raise value ray.worker.RayTaskError: ray_worker (pid=28563, host=bo-Surface-Book-2) Exception: The actor with name WrappedFunc failed to be imported, and so cannot execute this method
2019-07-15 05:55:30,173 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. ` Any suggestions of this?
Rerun the code for several times. Sometimes it reports No module named tracker or No module named model for unknowen reason.
I run the tuning process of SiamFC with OTB-2015 on the cloud by 8 GPUs, but it takes a long time to get the results. The population and group size of GA in tune_gune are both set to 100, the population size seems quite bigger, so I am thinking to reduce it, is that possible? Do you test the tuning process of SiamFC using GA with small population size?
- Yes, it takes a long time to tune hyper-parameters.
- The total hyperparametric numbers needs to reach a certain scale even with a small population size. It takes about one day with 8 GPUs.
- Try TPE for better results.
Thanks, the TPE seems not working after you fixing TAB-SPACE bugs of test_siamfc.py, and comment in code said the GA is faster than TPE(not faster enough, I prefer to use TPE ). Thus I run GA instead. I got error messages like this:
2019-07-15 05:55:28,916 WARNING experiment.py:30 --
trial_resourcesis deprecated. Please use
resources_per_trial.
trial_resources` will be removed in future versions of Ray. 2019-07-15 05:55:28,916 INFO tune.py:139 -- Did not find checkpoint file in ./TPE_results/zp_tune. 2019-07-15 05:55:28,916 INFO tune.py:145 -- Starting a new experiment. == Status == Using AsyncHyperBand: num_stopped=0 Bracket: Iter 180.000: None | Iter 60.000: None | Iter 20.000: None Bracket: Iter 180.000: None | Iter 60.000: None Bracket: Iter 180.000: None Resources requested: 0/4 CPUs, 0/1 GPUs Memory usage on this node: 6.7/16.7 GB 2019-07-15 05:55:29,000 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. == Status == Using AsyncHyperBand: num_stopped=0 Bracket: Iter 180.000: None | Iter 60.000: None | Iter 20.000: None Bracket: Iter 180.000: None | Iter 60.000: None Bracket: Iter 180.000: None Resources requested: 1/4 CPUs, 0.5/1 GPUs Memory usage on this node: 6.8/16.7 GB Result logdir: ./TPE_results/zp_tune PENDING trials:* fitness_2_scale_lr=0.5674,scale_penalty=0.9596,scale_step=1.0614,w_influence=0.3149: PENDING * fitness_3_scale_lr=0.5008,scale_penalty=0.9529,scale_step=1.1539,w_influence=0.2413: PENDING * fitness_4_scale_lr=0.6948,scale_penalty=0.9551,scale_step=1.011,w_influence=0.4641: PENDING * fitness_5_scale_lr=0.6827,scale_penalty=0.9836,scale_step=1.0539,w_influence=0.6642: PENDING RUNNING trials: * fitness_1_scale_lr=0.2358,scale_penalty=0.9937,scale_step=1.1995,w_influence=0.5447: RUNNING
2019-07-15 05:55:29,109 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. 2019-07-15 05:55:29,346 WARNING logger.py:27 -- Couldn't import TensorFlow - disabling TensorBoard logging. 2019-07-15 05:55:29,396 WARNING logger.py:27 -- Couldn't import TensorFlow - disabling TensorBoard logging. 2019-07-15 05:55:29,997 ERROR worker.py:1632 -- Failed to unpickle actor class 'WrappedFunc' for actor ID 4081809b3f48f44d15f26536b32befdb77815c5f. Traceback: Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/function_manager.py", line 632, in fetch_and_register_actor unpickled_class = pickle.loads(pickled_class) ModuleNotFoundError: No module named 'tracker' 2019-07-15 05:55:29,998 ERROR worker.py:1632 -- Failed to unpickle actor class 'WrappedFunc' for actor ID 7e05f09aa98b7d34320bcf1366eed7cd1bd0e8f1. Traceback: Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/function_manager.py", line 632, in fetch_and_register_actor unpickled_class = pickle.loads(pickled_class) ModuleNotFoundError: No module named 'tracker' 2019-07-15 05:55:29,999 ERROR trial_runner.py:413 -- Error processing event. Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 378, in _process_events result = self.trial_executor.fetch_result(trial) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 228, in fetch_result result = ray.get(trial_future[0]) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/worker.py", line 2132, in get raise value ray.worker.RayTaskError: ray_worker (pid=28562, host=bo-Surface-Book-2) Exception: The actor with name WrappedFunc failed to be imported, and so cannot execute this method 2019-07-15 05:55:30,037 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. 2019-07-15 05:55:30,129 ERROR trial_runner.py:413 -- Error processing event. Traceback (most recent call last): File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 378, in _process_events result = self.trial_executor.fetch_result(trial) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 228, in fetch_result result = ray.get(trial_future[0]) File "/home/bo/anaconda3/envs/siamDW/lib/python3.6/site-packages/ray/worker.py", line 2132, in get raise value ray.worker.RayTaskError: ray_worker (pid=28563, host=bo-Surface-Book-2) Exception: The actor with name WrappedFunc failed to be imported, and so cannot execute this method 2019-07-15 05:55:30,173 WARNING logger.py:105 -- Could not instantiate <class 'ray.tune.logger._TFLogger'> - skipping. ` Any suggestions of this?
Rerun the code for several times. Sometimes it reports No module named tracker or No module named model for unknowen reason.
Thanks.
I run the tuning process of SiamFC with OTB-2015 on the cloud by 8 GPUs, but it takes a long time to get the results.
The population and group size of GA in tune_gune are both set to 100, the population size seems quite bigger, so I am thinking to reduce it, is that possible? Do you test the tuning process of SiamFC using GA with small population size?