orybkin / sigma-vae-pytorch

A σ-VAE implementation in PyTorch
94 stars 14 forks source link

The losses rec, kl both becomes NAN #10

Open blgpb opened 8 months ago

blgpb commented 8 months ago

Dear Prof. Rybkin,

I have read your paper "Simple and Effective VAE Training with Calibrated Decoders". It is a very nice work! I have learned a lot from it. I have replaced it with VAE in my work. However, the losses of rec, kl are both becoming NAN. (https://github.com/orybkin/sigma-vae-pytorch/blob/master/model.py#L145) Can you help me figure out this problem?

image image image image

orybkin commented 8 months ago

Not a prof :) Are you able to run the code I provided on the data I tried and replicate the results?

On Wed, 20 Dec 2023 at 22:21, blgpb @.***> wrote:

Dear Prof. Rybkin,

I have read your paper "Simple and Effective VAE Training with Calibrated Decoders". It is a very nice work! I have learned a lot from it. I have replaced it with VAE in my work. However, the losses of rec, kl are both becoming NAN. ( https://github.com/orybkin/sigma-vae-pytorch/blob/master/model.py#L145) Can you help me figure out this problem?

image.png (view on web) https://github.com/orybkin/sigma-vae-pytorch/assets/40389895/04c8c31a-f6c9-4234-bcac-37838fbfe6f0 image.png (view on web) https://github.com/orybkin/sigma-vae-pytorch/assets/40389895/bfe4a85d-ccb1-456e-ae21-2da3fce6da00 image.png (view on web) https://github.com/orybkin/sigma-vae-pytorch/assets/40389895/c33acd5b-20bb-47ed-a082-f4afa6c53b36 image.png (view on web) https://github.com/orybkin/sigma-vae-pytorch/assets/40389895/f59b0bc3-ea7b-4382-96d4-a1b69afc65a6

— Reply to this email directly, view it on GitHub https://github.com/orybkin/sigma-vae-pytorch/issues/10, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACVGRTVQ2H2TBHTI4W6CPD3YKPIP7AVCNFSM6AAAAABA54WHT2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2TCNZXGQ3DIMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

blgpb commented 8 months ago

Thank you for your reply! I have sent you a detailed email. I look forward to your reply! Thank you very much!

orybkin commented 8 months ago

Note that you need to use the soft clipping to prevent the variance from going to 0.

On Wed, 20 Dec 2023 at 22:58, Oleh Rybkin @.***> wrote:

Not a prof :) Are you able to run the code I provided on the data I tried and replicate the results?

On Wed, 20 Dec 2023 at 22:21, blgpb @.***> wrote:

Dear Prof. Rybkin,

I have read your paper "Simple and Effective VAE Training with Calibrated Decoders". It is a very nice work! I have learned a lot from it. I have replaced it with VAE in my work. However, the losses of rec, kl are both becoming NAN. ( https://github.com/orybkin/sigma-vae-pytorch/blob/master/model.py#L145) Can you help me figure out this problem?

image.png (view on web) https://github.com/orybkin/sigma-vae-pytorch/assets/40389895/04c8c31a-f6c9-4234-bcac-37838fbfe6f0 image.png (view on web) https://github.com/orybkin/sigma-vae-pytorch/assets/40389895/bfe4a85d-ccb1-456e-ae21-2da3fce6da00 image.png (view on web) https://github.com/orybkin/sigma-vae-pytorch/assets/40389895/c33acd5b-20bb-47ed-a082-f4afa6c53b36 image.png (view on web) https://github.com/orybkin/sigma-vae-pytorch/assets/40389895/f59b0bc3-ea7b-4382-96d4-a1b69afc65a6

— Reply to this email directly, view it on GitHub https://github.com/orybkin/sigma-vae-pytorch/issues/10, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACVGRTVQ2H2TBHTI4W6CPD3YKPIP7AVCNFSM6AAAAABA54WHT2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2TCNZXGQ3DIMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

blgpb commented 8 months ago

Thank you very much. I have used the soft clipping to prevent the variance from going to 0 (https://github.com/orybkin/sigma-vae-pytorch/blob/master/model.py#L125C10-L125C10). Unfortunately, the problem still exists.

orybkin commented 8 months ago

I see. In that case I am not quite sure. I would check what the variance is to make sure it's not too low. You can try to increase the lower limit. if a normal VAE worked for you, you could try to make the implementation as similar as possible in terms of loss scale, e.g. divide the loss by data shape manually. If you are using half precision, you would need to be more careful about loss scale. If this fails, I am not sure how to help, you'd just have to debug it.

On Wed, 20 Dec 2023 at 23:40, blgpb @.***> wrote:

Thank you very much. I have used the soft clipping to prevent the variance from going to 0 ( https://github.com/orybkin/sigma-vae-pytorch/blob/master/model.py#L125C10-L125C10). Unfortunately, the problem still exists. ------------------ Original ------------------ From: "orybkin/sigma-vae-pytorch" @.>; Date: Thu, Dec 21, 2023 03:26 PM @.>; Cc: " @.**@.>; Subject: Re: [orybkin/sigma-vae-pytorch] The losses rec, kl both becomes NAN (Issue #10)

Note that you need to use the soft clipping to prevent the variance from going to 0.

On Wed, 20 Dec 2023 at 22:58, Oleh Rybkin @.***> wrote:

> Not a prof :) Are you able to run the code I provided on the data I tried > and replicate the results? > > On Wed, 20 Dec 2023 at 22:21, blgpb @.***> wrote: > >> Dear Prof. Rybkin, >> >> I have read your paper "Simple and Effective VAE Training with Calibrated >> Decoders". It is a very nice work! I have learned a lot from it. I have >> replaced it with VAE in my work. However, the losses of rec, kl are both >> becoming NAN. ( >> https://github.com/orybkin/sigma-vae-pytorch/blob/master/model.py#L145) >> Can you help me figure out this problem? >> >> image.png (view on web) >> < https://github.com/orybkin/sigma-vae-pytorch/assets/40389895/04c8c31a-f6c9-4234-bcac-37838fbfe6f0&gt;

>> image.png (view on web) >> < https://github.com/orybkin/sigma-vae-pytorch/assets/40389895/bfe4a85d-ccb1-456e-ae21-2da3fce6da00&gt;

>> image.png (view on web) >> < https://github.com/orybkin/sigma-vae-pytorch/assets/40389895/c33acd5b-20bb-47ed-a082-f4afa6c53b36&gt;

>> image.png (view on web) >> < https://github.com/orybkin/sigma-vae-pytorch/assets/40389895/f59b0bc3-ea7b-4382-96d4-a1b69afc65a6&gt;

>> >> — >> Reply to this email directly, view it on GitHub >> <https://github.com/orybkin/sigma-vae-pytorch/issues/10&gt;, or unsubscribe >> < https://github.com/notifications/unsubscribe-auth/ACVGRTVQ2H2TBHTI4W6CPD3YKPIP7AVCNFSM6AAAAABA54WHT2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2TCNZXGQ3DIMI&gt;

>> . >> You are receiving this because you are subscribed to this thread.Message >> ID: @.***> >> >

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/orybkin/sigma-vae-pytorch/issues/10#issuecomment-1865781971, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACVGRTTIPJL4PTOEUZWRTKLYKPRWVAVCNFSM6AAAAABA54WHT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRVG44DCOJXGE . You are receiving this because you commented.Message ID: @.***>