[RLlib] Could not save keras model under self[TfPolicy].model.base_model!

What happened + What you expected to happen

Hello, recently I encountered the following bug when using Algorithm.export_policy_model():

WARNING tf_policy.py:646 -- Could not save keras model under self[TfPolicy].model.base_model!
    This is either due to ..
    a) .. this Policy's ModelV2 not having any `base_model` (tf.keras.Model) property
    b) .. the ModelV2's `base_model` not being used by the Algorithm and thus its
       variables not being properly initialized.

I found out that this problem happens when the model configuration 'free_log_std': True is set. The following log was shown when I run_experiment with that configuration included:

WARNING:tensorflow:
The following Variables were used a Lambda layer's call (lambda), but
are not present in its tracked objects:
  <tf.Variable 'default_policy/log_std:0' shape=(1,) dtype=float32>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.

This problem disappeared when I set 'free_log_std': False.

Is there any solution if I want to set 'free_log_std': True and able to export_policy_model() successfully? This is crucial as I need the exported model for TensorFlow Serving.

Versions / Dependencies

Ray 2.3.0 TF 2.11.0 Python 3.10

Reproduction script

Run the following python script (e.g., on Notebook) for reproduction:

import ray
from ray.rllib.algorithms.algorithm import Algorithm
from ray.tune.tune import run_experiments

ray.init()
config = {'training': {'env': 'CartPole-v1',
              'run': 'PPO',
              'stop': {'training_iteration': 1},
              'config': {'framework': 'tf',
               'eager_tracing': False,
               'gamma': 0.99,
               'kl_coeff': 1.0,
               'num_sgd_iter': 20,
               'lr': 0.0001,
               'sgd_minibatch_size': 1000,
               'train_batch_size': 25000,
               'model': {'free_log_std': True},
               'num_workers': 7,
               'num_gpus': 0,
               'batch_mode': 'truncate_episodes'},
              'local_dir': '/home/ec2-user/ray_results/intermediate',
              'checkpoint_config': {'checkpoint_at_end': True,
               'checkpoint_frequency': 10}}}

trials = run_experiments(config)

# Location of checkpoint after run_experiment
path_to_checkpoint = '/home/ec2-user/ray_results/intermediate/training/PPO_CartPole-v1_5c4d4_00000_0_2023-03-24_10-26-38/checkpoint_000001'
algo = Algorithm.from_checkpoint(path_to_checkpoint) 
algo.export_policy_model(os.path.join('/home/ec2-user/ray_results/intermediate', "1")) # Issue happen here
ray.shutdown()

Issue Severity

High: It blocks me from completing my task.

ray-project / ray