hamarh / HMNet_pth

PyTorch implementation of Hierarchical Neural Memory Network
BSD 3-Clause "New" or "Revised" License
36 stars 5 forks source link

Questions about 'TBPTT' #5

Closed JDYG closed 1 year ago

JDYG commented 1 year ago

Hi, In your providing code and README files, there are some TBPTT versions for the detection task. However, the meaning of 'TBPTT' is not explicitly mentioned in your paper or README file. May I inquire whether 'TBPTT' stands for 'Truncated Back-Propagation Through Time'?

Additionally, I'm curious as to why the training epoch for the TBPTT version is set to only 1. Is one epoch sufficient for achieving optimal parameterization?

I would greatly appreciate it if you could provide some informative details regarding 'TBPTT.'

hamarh commented 1 year ago

Yes, 'TBPTT' stands for 'Truncated Back-Propagation Through Time'.

We set the training epoch as one because sufficient training is already performed before TBPTT training step and we observe no performance gain when using longer epochs in TBPTT training. As shown in README, the training is performed in two steps: (STEP1) we first train a model without TBPTT for 90 epochs using short sequences (200msec). (STEP2) we then perform TBPTT for one epoch to adjust the model for longer sequences (8.1sec). We needed TBPTT because forward and backward computation with a 8.1sec sequence cannot fit into GPU memory even setting batch size as 1 per GPU.

In addition, the total amount of data used for STEP2 is not a little even for one epoch because the length of one sample is much longer for STEP2. We define one epoch as 72371 training samples, so the total amout of sequence length processed during training is, STEP1: 0.2 sec 72371 samples 90 epochs ~= 362 hours STEP2: 8.1 sec 72371 samples 1 epoch ~= 163 hours This may be seen that STEP2 has much larger effective batch size.

JDYG commented 1 year ago

Thanks for your detailed explanation. The information you provided helps me a lot.