microsoft / DCVC

Deep Contextual Video Compression
MIT License
408 stars 66 forks source link

DCVC-DC training code #31

Closed nzomi closed 8 months ago

nzomi commented 1 year ago

I'm quite interested in the training code for DCVC-DC, especially in understanding how the hierarchical weights influence the model during the training process. Specifically, I implemented the hierarchical weights in my own model, adopting the popular multi-stage training approach (IP, PP, PP, PP, ...) and then the cascade training for sequences like IPPPPP, PPPPPP, and so on.

I've observed a fascinating behavior in my model during the multi-stage training, and it appears to align with the trend you mentioned. However, when transitioning to the cascade stage, a peculiar shift in the curve by 2 frames caught my attention. After thoroughly inspecting my code, I'm confident there are no bugs. Did you incorporate any special techniques during your training, or do you have insights into this phenomenon? I'm eagerly anticipating your response.

image

AaronCIH commented 1 year ago

@nzomi

Hi, I'm quite interested in the training code for DCVC-DC as well. However, the model trends to collapse after few epochs. Do you have similar problem? or could you share your training code for reference.

Thank you so mush.

nzomi commented 11 months ago

@Cihsaing Hi Cihsaing! I simply used the training strategy provided by TCM v1 https://arxiv.org/abs/2111.13850v1

Aaron7noraA commented 8 months ago

@nzomi Hi, Could I discuss some details with you personally?

semihese commented 8 months ago

I observe exactly the same issue as @nzomi. After cascaded training with hierarchical GOP (using the same weights mentioned in the paper), the BD-PSNR rate is higher than the officially trained models, if the test sequences are only 5 frames long. However the BD-rate rapidly deteriorates for sequences longed than 5 frames. At the end, when coding a sequence that is like 32 frames long, the coding performance is much worse than official models.

I use the vimeo90k, that is 7 frames long for training. could that be the issue here? Or is there any other tricks in the training to keep the PSNR more consistent?

james20181013 commented 8 months ago

I'm so interested in the DCVC-FM, could you please send me the traing codes. Thanks very much.

hedelong92@163.com