Error when calculating validation metrics

rat-nick commented 1 year ago

I tried to run the example, followed all the steps, downloaded the dataset and ran data.py for preparation. After I run python main.py --cuda I get the following output:

/home/nratinac/vae-cf-pytorch/env/lib/python3.8/site-packages/torch/nn/functional.py:1956: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
| epoch   1 |  100/ 233 batches | ms/batch 36.19 | loss 572.18
| epoch   1 |  200/ 233 batches | ms/batch 29.29 | loss 536.58
/home/nratinac/vae-cf-pytorch/metric.py:23: RuntimeWarning: invalid value encountered in divide
  return DCG / IDCG
/home/nratinac/vae-cf-pytorch/metric.py:36: RuntimeWarning: invalid value encountered in divide
  recall = tmp / np.minimum(k, X_true_binary.sum(axis=1))
-----------------------------------------------------------------------------------------
| end of epoch   1 | time: 11.09s | valid loss 418.21 | n100   nan | r20   nan | r50   nan
-----------------------------------------------------------------------------------------
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.

Chrystalii commented 1 year ago

I tried to run the example, followed all the steps, downloaded the dataset and ran data.py for preparation. After I run python main.py --cuda I get the following output:

/home/nratinac/vae-cf-pytorch/env/lib/python3.8/site-packages/torch/nn/functional.py:1956: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
| epoch   1 |  100/ 233 batches | ms/batch 36.19 | loss 572.18
| epoch   1 |  200/ 233 batches | ms/batch 29.29 | loss 536.58
/home/nratinac/vae-cf-pytorch/metric.py:23: RuntimeWarning: invalid value encountered in divide
  return DCG / IDCG
/home/nratinac/vae-cf-pytorch/metric.py:36: RuntimeWarning: invalid value encountered in divide
  recall = tmp / np.minimum(k, X_true_binary.sum(axis=1))
-----------------------------------------------------------------------------------------
| end of epoch   1 | time: 11.09s | valid loss 418.21 | n100   nan | r20   nan | r50   nan
-----------------------------------------------------------------------------------------
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.

I face the same error, did you fixed it?

Chrystalii commented 1 year ago

I tried to run the example, followed all the steps, downloaded the dataset and ran data.py for preparation. After I run python main.py --cuda I get the following output:

/home/nratinac/vae-cf-pytorch/env/lib/python3.8/site-packages/torch/nn/functional.py:1956: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
| epoch   1 |  100/ 233 batches | ms/batch 36.19 | loss 572.18
| epoch   1 |  200/ 233 batches | ms/batch 29.29 | loss 536.58
/home/nratinac/vae-cf-pytorch/metric.py:23: RuntimeWarning: invalid value encountered in divide
  return DCG / IDCG
/home/nratinac/vae-cf-pytorch/metric.py:36: RuntimeWarning: invalid value encountered in divide
  recall = tmp / np.minimum(k, X_true_binary.sum(axis=1))
-----------------------------------------------------------------------------------------
| end of epoch   1 | time: 11.09s | valid loss 418.21 | n100   nan | r20   nan | r50   nan
-----------------------------------------------------------------------------------------
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.

Hi, try change main.py line 192 with " return total_loss, np.nanmean(n100_list), np.nanmean(r20_list), np.nanmean(r50_list)"

The error is due to the np.mean() in the original file returning Nan when at least one Nan value appears in the calculation array, as I check, there are always 1-2 nan values in the returned Recall and NDCG list (you can print it to see).

After using np.nanmean(), the code goes well.

rat-nick commented 1 year ago

You are right. Changing the way the mean is calculated to account for missing NaNs is the way to fix the error.

younggyoseo / vae-cf-pytorch

Error when calculating validation metrics #5