Closed Ahmedest61 closed 6 years ago
I read that running tensorflow operations on CPU consumes more memory so I have made the code to run on GPU but still facing the problem of crashing. :( The problem get started when in session(client).py, I run tf_session.TF_Run(session, options, feed_dict, fetch_list, target_list, rub_metadata, status) in _run_fn() Any help would be really appreciated!
@Ahmedest61 My program is running directly on the GPU but there will be the same problem, this should be caused by too large tensor.
@harryeee Actually I was using tensorflow not tensorflow-gpu so after when I installed the tensorflow-gpu using conda, It started using GPU but facing this problem of memory crash :(
@harryeee are you facing the same memory issues?
@Ahmedest61 Yes,the same memory issues.
CapsNet is a very big network and required a lot of memory to work properly, especially when it comes to dealing with big images. This is also a recent architecture, close to the research, everything new you could attempt with this architecture is probably close to the research area. Being able to scale a capsule network to large datasets and/or big images is a research problem. There is maybe already some paper talking about how to scale CapsuleNetwork, or at least I think there is research teams working on while I am typing this message.
I somehow resize the training image dataset to (60,60), with caps_1_nb_filter set to 8 and also changed other values in the model file so to get the same size reconstructed images at the end. It some how allowed me to train it. @thibo73800 @harryeee tell me one thing if I only intend to have digit caps or in my case let say output caps at the end. I'm more concerned with the classifier layer and don't want to reconstruct the image(s), just train the network to test any new poze variant image. I would really appreciate if you all could share your thoughts of changes should I make in terms of loss equation/architecture?
I have finally able to test it on my dataset; the accuracy is 99%. My dataset contains 43 labels with around 14k images overall, each label has images of a place with different weather condition and lightning. So initially I randomly distribute 50%, 22% and 22% images of each label to training, validation and test folder. The randomly distribution I have done for each label separately. In each label, the image(s) is of a outdoor place with different weather condition i.e. sunny, cloudy and may be rainy. The aim is to train random images of a single place with different conditions and to test the images of the same place with different condition too. Although its against the rule of machine learning; the test images must be different from validation. What do you say guys? How can I further test my scenario? as I want to test the environmental changes can affect the place to recognize or not. Any changes I can make to better test the accuracy, your suggestion would be really appreciated. :-)
I also need to run this code with new size pictures. May I know what programs you have modified? @Ahmedest61
@thibo73800 During training with different images sizes i.e. 128x128 when it calls and runs init_session() in the init() and self.sess.run(tf.global_variables_initializer()) in init_session() system memory reaches up to 32Gb and get crash. Why is like that? ( Perhaps I only change the caps_2_vec_len to 128 in json file as well tplaceholder() input argument of (32,32) to (128,128) in _build_inputs() and resize_nearest_neighbor() input argument from (32,32) to (128,128) in _build_decoder() )