More efficient pooling - Githubissues

ThomasOerkild commented 5 years ago

First of all thank you for your work done in this project, it has been a huge help!

The current implementation calls MeshUnion.union a lot of times during each pooling step, which takes a lot of time. By storing all the source and target rows, we can vectorize the union operation, and achieve a ~2x speed-up in the training time.

Output from training human_seg before this change takes around 314 s per epoch:

saving the latest model (epoch 1, total_steps 8)
(epoch: 1, iters: 40, time: 0.840, data: 3.949) loss: 2.140 
(epoch: 1, iters: 80, time: 0.822, data: 0.011) loss: 1.898 
(epoch: 1, iters: 120, time: 0.833, data: 0.012) loss: 1.771 
(epoch: 1, iters: 160, time: 0.788, data: 0.012) loss: 1.831 
(epoch: 1, iters: 200, time: 0.786, data: 0.015) loss: 1.598 
(epoch: 1, iters: 240, time: 0.784, data: 0.012) loss: 1.485 
(epoch: 1, iters: 280, time: 0.771, data: 0.016) loss: 1.578 
(epoch: 1, iters: 320, time: 0.779, data: 0.011) loss: 1.392 
(epoch: 1, iters: 360, time: 0.775, data: 0.013) loss: 1.240 
saving the model at the end of epoch 1, iters 384
End of epoch 1 / 2100    Time Taken: 314 sec
learning rate = 0.0010000
Running Test
loaded mean / std from cache
loading the model from ./checkpoints\human_seg\latest_net.pth
epoch: 1, TEST ACC: [56.706 %]

After this change it takes around 149 s per epoch:

saving the latest model (epoch 1, total_steps 8)
(epoch: 1, iters: 40, time: 0.353, data: 3.925) loss: 2.087 
(epoch: 1, iters: 80, time: 0.412, data: 0.010) loss: 1.932 
(epoch: 1, iters: 120, time: 0.402, data: 0.022) loss: 1.822 
(epoch: 1, iters: 160, time: 0.369, data: 0.012) loss: 1.755 
(epoch: 1, iters: 200, time: 0.364, data: 0.012) loss: 1.702 
(epoch: 1, iters: 240, time: 0.358, data: 0.027) loss: 1.624 
(epoch: 1, iters: 280, time: 0.358, data: 0.014) loss: 1.503 
(epoch: 1, iters: 320, time: 0.359, data: 0.012) loss: 1.546 
(epoch: 1, iters: 360, time: 0.375, data: 0.010) loss: 1.364 
saving the model at the end of epoch 1, iters 384
End of epoch 1 / 2100    Time Taken: 149 sec
learning rate = 0.0010000
Running Test
loaded mean / std from cache
loading the model from ./checkpoints\human_seg\latest_net.pth
epoch: 1, TEST ACC: [66.507 %]

ranahanocka commented 5 years ago

Hi @ThomasOerkild ,

Thanks a lot for the PR! Indeed, I noticed an improvement in speed as well. I added some unit tests to the repo which download the pre-trained weights and run inference to check that the accuracy is as expected. I added tests for the 'shrec' (classification dataset) and 'human_seg' (segmentation dataset). The shrec test passed, but the human_seg failed:

AssertionError: human_seg accuracy was 91.19 and not 92.554

The only difference between the two is the unpooling layer, since classification doesn't use unpooling. Do you know why the behavior has changed?

ThomasOerkild commented 5 years ago

Hmm yeah I should properly have checked that before submitting. I been looking into it and found that you actually can't vectorize it the way I did it, and obtain the same result.

I didn't think about the fact that the rows that have been the targets, can also be used as source. An example could be:

A = torch.Tensor([[1,2], [3,4]])
source = [0,1]
target = [1,1]
A[target,:] += A[source,:]
# Will output:
# tensor([[1., 2.], [6., 8.]])

A = torch.Tensor([[1,2], [3,4]])
for s, t in zip(source, target):
    A[t,:] += A[s,:]
# Will output:
# tensor([[ 1.,  2.], [ 8., 12.]])

So since it needs to be done sequentially, to obtain the same result, I'm afraid that it can't be vectorized.

ChristianIngwersen commented 5 years ago

Hi @ranahanocka, We have been experimenting a bit more with this new faster way of pooling even though it doesn't exactly correspond to the way you pool. Our thought is that it might "learn" to collapse the "right" features for this new approach instead of the ones learned in the pre-trained weights. In order to test this, we are currently training different models to see if it is possible to achieve the same accuracy with this new way of pooling.

Would you be willing to share the exact arguments needed to reproduce the 92.3% accuracy? Have without any success yet been trying with the standard train.sh script. I've also been trying with the architecture you describe in the appendix of the paper. One issue, however, is that from how I understand the paper I should have --ncf 32 64 128 256 and --pool_res 1200 900 300 279 but the code doesn't run when len(--ncf) != len(--pool_res)+1. Hope you can point us in the right direction, thanks!

ranahanocka commented 4 years ago

Hi @ChristianIngwersen ,

I had a similar thought -- that probably the network can still learn with this new way of pooling (maybe even it is better? who knows!).

About the exact arguments, they are indeed what I provided in the train segmentation script. I think the arxiv version of the paper had some incorrect numbers about the configuration of the segmentation network which was updated in the final version. But anyway, the code is indeed what we used to get 92.3% accuracy.

Did you try running it a few times? That is rather odd, as I get to this accuracy rather quickly (less than 50 epochs). Like all learning sometimes with different seeds the results can vary a bit, but I noticed (empirically) that the results are relatively stable.

ChristianIngwersen commented 4 years ago

Hi @ranahanocka , We managed to achieve the same peak accuracy after a few runs and after changing the pooling in the PR a bit + rewriting some of the code as sparse we actually managed to achieve a 93.15% accuracy on the human seg with a "bigger" network. We are still finishing the code, but we'll provide it later :)

ranahanocka commented 4 years ago

Hi @ChristianIngwersen Very nice!! Did the same architecture without the new pooling reach the same peak accuracy? Anyway, looking forward to checking it out :)

ranahanocka / MeshCNN

More efficient pooling #41