Poisson Loss - Githubissues

Karol-Gawlowski commented 2 years ago

Hi!

I tried to use the loss_poisson but have found that the errors obtained in each consecutive epoch increase instead of decreasing. Maybe there's something I'm doing wrong here. I'd be thankful for some help.

Below I've put the code that reproduces my issue. model5 is just a keras_model_sequential() with dense layers, which for loss = 'mse' works just fine

model5 %>% compile(
  loss = 'Poisson',
  optimizer = optimizer_rmsprop(),
  metrics = list("mean_absolute_error","mean_squared_error")
)

history = model5 %>% fit(
  callbacks = list(early_stop),
  train %>% select(-ClaimNb) %>% as.matrix(), 
  train$ClaimNb, 
  epochs = 30, 
  batch_size = 512, 
  validation_split = 0.2,
  shuffle = TRUE
)

t-kalinowski commented 2 years ago

Hi, thanks for posting. I can confirm that at least for an API and syntax perspective, you are compiling and fitting the model correctly. Your issue is more likely to be coming from the data and/or model architecture. I don't have access to your dataset or model, so I can only hypothesize in the abstract:

Are you normalizing all your inputs to the range [0, 1]?
What is the final activation? 'exponential'?
Maybe try an adaptive optimizer (adam), perhaps also with a learning rate that's smaller than the default.

Karol-Gawlowski commented 2 years ago

Thanks for a quick reply!

Regarding your questions:

Yes all inputs are normalized
The final activation is just linear
I checked it too and the results are the same. Please find the screenshot below.

That was the model I used for the above example

early_stop = callback_early_stopping(monitor = "val_loss", patience = 10)
model8 = keras_model_sequential() 
model8 %>% 
  layer_dense(units = 64, activation = 'tanh', input_shape = c(ncol(train)-1),
              kernel_initializer=initializer_random_uniform(minval = -0.5, maxval = 0.5, seed = seed)) %>% 
  layer_dropout(rate = 0.4) %>%
  layer_dense(units = 32, activation = 'tanh') %>%
  layer_dropout(rate = 0.3) %>%
  layer_dense(units = 16, activation = 'tanh') %>%
  layer_dense(units = 4, activation = 'tanh') %>%
  layer_dense(units = 1)

Interestingly, when optimizing for MSE and just monitoring the poisson loss, I only get the validation loss displayed, not training loss.

For the background, I am working on claim frequency prediction for a motor insurance dataset that can be found here https://www.kaggle.com/floser/french-motor-claims-datasets-fremtpl2freq/version/1

Here's an article aiming to do the same thing. They are using a different NN architecture where the poisson loss works fine, which is not the case with keras_model_sequential() https://www.kaggle.com/floser/glm-neural-nets-and-xgboost-for-insurance-pricing?web=1&wdLOR=c2301BBFA-6FE1-43C0-8A30-13B9F3CE9E14

I really appreciate your help. Let me know if there are any other details I could supply to pin down the issue.

Many thanks

t-kalinowski commented 2 years ago

I downloaded the dataset and spun up a few small models. It seems to me like the Poisson loss function is working as expected, and the resultant models are training with the Poisson loss about as well as with the MSE loss (neither particularly well).

The not-displaying-Poisson-loss/metric issue in your screenshot I suspect is due to the fact that taking log(0) results in a NaN in tensorflow, which of course leads to everything breaking. The target has many 0 values, and figuring out how to make that work with a Poisson loss is part of the data pre-processing and model architecture design work. In the code example below, I just did +1 as a quick hack. You may be interested in using callback_terminate_on_naan().

Poking around the dataset it strikes me like the type of problem that would significantly benefit from some extended feature preprocessing. E.g., bucketizing some of the numeric features and treating them as categorical, maybe censoring specific values that might be errors. Perhaps even re-framing this as a categorical rather then a regression problem. Also, the dataset is extremely unbalanced and relatively small. The models would probably benefit from upweighting the rare cases where claims were filed by passing fit(.., sample_weights =), and also a more careful (stratified) validation split.

However, github issues are mostly intended for bugs and API design, and this is I think outside that scope. A better venue for discussions about model design and data processing might be https://community.rstudio.com/, stackoverflow, or kaggle.

Thanks for the link to the kaggle notebook btw, I enjoyed seeing that intricate keras model!

Cheers!

library(keras)
library(dplyr, warn.conflicts = FALSE)
library(magrittr)

# kaggle datasets download -d floser/french-motor-claims-datasets-fremtpl2freq
# unzip archive.zip

df <- readr::read_csv("freMTPL2freq.csv") |> suppressMessages()
df
#> # A tibble: 678,013 × 12
#>    IDpol ClaimNb Exposure Area  VehPower VehAge DrivAge BonusMalus VehBrand
#>    <dbl>   <dbl>    <dbl> <chr>    <dbl>  <dbl>   <dbl>      <dbl> <chr>   
#>  1     1       1     0.1  D            5      0      55         50 B12     
#>  2     3       1     0.77 D            5      0      55         50 B12     
#>  3     5       1     0.75 B            6      2      52         50 B12     
#>  4    10       1     0.09 B            7      0      46         50 B12     
#>  5    11       1     0.84 B            7      0      46         50 B12     
#>  6    13       1     0.52 E            6      2      38         50 B12     
#>  7    15       1     0.45 E            6      2      38         50 B12     
#>  8    17       1     0.27 C            7      0      33         68 B12     
#>  9    18       1     0.71 C            7      0      33         68 B12     
#> 10    21       1     0.15 B            7      0      41         50 B12     
#> # … with 678,003 more rows, and 3 more variables: VehGas <chr>, Density <dbl>,
#> #   Region <chr>

target <- df$ClaimNb

features <- df %>%
  select(-ClaimNb) %>% # target
  lapply(function(f) {
    if (is.numeric(f))
      scales::rescale(f, c(0, 1))
    else if (is.character(f))
      f |> as.factor() |> as.integer() |> subtract(1) |> to_categorical()
    else 
      stop()
  }) %>%
  do.call(cbind, .)
#> Loaded Tensorflow version 2.6.0

fit_and_evaluate <- function(loss, final_activation) {
  model <- keras_model_sequential(input_shape = ncol(features)) %>%
    layer_dense(units = 32, activation = 'relu') %>%
    layer_dense(units = 32, activation = 'relu') %>%
    layer_dense(units = 1, activation = final_activation)

  model %>%
    compile(
      optimizer = "adam",
      loss = loss,
      metrics = c("mse", "poisson")
    ) %>%
    fit(features, target+1, batch_size = 256, epochs = 5)

  evaluate(model, features, target, batch_size = 256)
}

fit_and_evaluate(loss = "Poisson", final_activation = k_exp)
#>     loss      mse  poisson 
#> 1.044463 1.050129 1.044463
fit_and_evaluate(loss = "MSE", final_activation = "linear")
#>     loss      mse  poisson 
#> 1.021652 1.021652 1.031347

^{Created on 2021-10-13 by the reprex package (v2.0.1)}

rstudio / keras3

Poisson Loss #1280