Closed enesdoruk closed 2 years ago
Hey @enesdoruk, I am too experiencing the slow training. Any solutions @ybkscht ?
model is training one v100 GPU, 1.5k data, phi=3 and batch size=4, it is 2 days and epoch is 150. It is too slow @ybkscht @satpalsr
Hi @enesdoruk ,
it sounds like the bottleneck in your case is not the GPU but instead loading, preprocessing and getting the data fast enough to your GPU (the generator part). You can try using the --multiprocessing argument which starts multiple generator processes and should speed up your training if the generator is the bottleneck. But please note that from my experience using multiprocessing can cause problems, especially with windows. You can also try setting the --workers argument higher which starts multiple generator threads (in case multiprocessing is False). Because of GIL the generator is not really parallelized using multiple threads but as far as I know it can speed up IO (loading data from disk).
If your dataset is small enough to fit into your memory you can also try this to skip accessing the disk for each example which is quiet expensive.
So basically you should try to speed up the data loading and preprocessing part.
Sincerely, Yannick
I am using 12 workers and I tried to use multiprocessing but when I multiprocessing active, training not started and did not give a reaction(not error or warning) @ybkscht.
" If your dataset is small enough to fit into your memory you can also try this to skip accessing the disk for each example which is quiet expensive. " I don't understand this sentence. @ybkscht
And i want to one example, i used three different GPU in different training: one gtx1080, one gtx1660, one tesla v100. when I starting with BS 1 phi 0 and same dataset, training time almost same, there is small soo difference @ybkscht
The generator loads every image and annotation file from your disk when creating a new batch and this is a quiet expensive operation and it is possible that it is the bottleneck in your case. So instead of having to load every example always from disk, you can try to load the dataset into your memory at beginning. You can either try using ramdisk or to change the generator so that it loads the dataset into memory in the init method. But as already mentioned this works only if your dataset is small enough to fit into your memory.
Do you think what is the problem for multiprocessing, I activated multiprocessing and waited 30 minutes, there is no reaction @ybkscht
I don't really know what the problem here is but I often had and still have problems trying to use multiprocessing with tensorflow, especially under windows.
finally, which changing should i do in this project for generator. which file and which line, can you explain changing for generator loads. @ybkscht and can I set multiprocessing True and worker bigger than 0 same time.
In generators/linemod.py (and occlusion.py) the paths to all images and masks of your dataset are currently stored in lists. And if the generator needs to generate a batch (getitem method of the generator base class in generators/common.py) then the needed images and masks are loaded from disk using the paths stored in the lists (load_image and load_mask methods in generators/linemod.py).
So you can try to load all images and masks in the init method of linemod.py and store them in lists and change the load_image and load_masks methods, so that they only return the images from the lists instead of having them to load from disk.
For example add this in the init method of LineModGenerator in linemod.py after shuffling the dataset (line 123):
self.all_images = []
for path_to_image in self.image_paths:
#from load_image method
image = cv2.imread(path_to_image)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
self.all_images.append(image)
self.all_masks = []
for path_to_mask in self.mask_paths:
#from load_mask method
mask = cv2.imread(path_to_mask)
self.all_masks.append(mask)
And then change the load_image and load_mask methods of LineModGenerator in linemod.py:
def load_image(self, image_index):
""" Load an image at the image_index.
"""
return copy.deepcopy(self.all_images[image_index])
def load_mask(self, image_index):
""" Load mask at the image_index.
"""
return copy.deepcopy(self.all_masks[image_index])
Please note that this is just some example code I wrote down quickly and didn't test if it works, so maybe you have to fix some bugs. But it should be a good starting point and give you the idea.
If multiprocessing doesn't work you should set it to False but not use it with workers = 0 because then everything runs sequential and is probably relatively slow.
Thanks, @ybkscht, it works. There is no huge difference but a bit faster. I will try v100 GPU. I use multiprocessing with workers 0, when I deactivate multiprocessing, I set workers > 0.
Hi i have gtx1080 grapgics card and one epoch takes 5 minutes. And when i use tesla v100 on google cloud computing, takes 5 minutes too. I cant understand, How i can solve and what is the problem?