The choice is limited to the training set and validation set, whereas the test set needs to be selected during the evaluation process.
Considering the calculated values, using the training set yields values mentioned in the paper, which are < 0.01. In contrast, when using the test data, the KL values are > 0.01, the difference in values is approximately 5 to 10 times.
In the script evaluate_kl_divergence_object_category.py, the code for selecting the data is as follows:
parser.add_argument( "--splits", choices=[ "training", "validation" ], default="training", help="Split to evaluate" )
The choice is limited to the training set and validation set, whereas the test set needs to be selected during the evaluation process. Considering the calculated values, using the training set yields values mentioned in the paper, which are < 0.01. In contrast, when using the test data, the KL values are > 0.01, the difference in values is approximately 5 to 10 times.