rlee3359 / folding-by-hand

1 stars 0 forks source link

about train #6

Closed zcswdt closed 1 month ago

zcswdt commented 3 months ago

Excuse me, I found that the images were scaled to 6464 during training. Can I use 300300 images directly for training? I found that the code will traverse each point, then generate a place-map, and then select the largest one from these place-maps. The corresponding pick and place are the most suitable points. Is scaling to 64*64 considered to be time-consuming?

rlee3359 commented 3 months ago

Why do you want to use larger images? Are you trying to improve training speed? Scaling images down is not an expensive operation. If you use larger images, the number of points that will be sampled on the cloth mask will increase dramatically, which means you will need a lot more GPU memory to do inference for the model. If the object you are manipulating has a lot of detail that needs to be captured for manipulation then using larger images might be necessary, but I recommend finding the smallest image size that works.

zcswdt commented 3 months ago

Thanks for your reply. In neural networks, 224224 is usually used as input to the neural network. Will 6464 affect the prediction accuracy? I want to increase the image resolution based on the above considerations. If the impact is not significant, I don't need to change the image size. I have a 4090 graphics card, which should be enough in terms of GPU resources.

rlee3359 commented 3 months ago

A single forward pass of the policy at test time requires a batch of images with the batch size equal to the number of pick points sampled on the cloth mask. For this reason I chose the smallest image size I could that would still maintain accuracy and visibility of details in the cloth. 64 is commonly used in reinforcement learning and imitation learning.

I believe a larger image size will likely work fine as well if you have enough GPU memory. You can also work out how large each pixel is in workspace dimensions and see if you need higher precision. Probably, having precision much smaller than the robots gripper won't give significant benefits, I would assume.

You might also have to change the sigma value for the gaussian heatmaps so they maintain similar relative size in the image.