optuna / optuna

A hyperparameter optimization framework
https://optuna.org
MIT License
10.91k stars 1.04k forks source link

Cannot store CMA object on PostgreSQL and MySQL. #1775

Closed c-bata closed 4 years ago

c-bata commented 4 years ago

Problem details

This bug is reported by @maartenpants on gitter. Original report is here:

Hi there, has anyone else experienced this issue with CmaEsSampler and the PostgreSQL storage backend:

(psycopg2.errors.StringDataRightTruncation) value too long for type character varying(2048)
[SQL: INSERT INTO trial_system_attributes (trial_id, key, value_json) VALUES (%(trial_id)s, %(key)s, %(value_json)s) RETURNING trial_system_attributes.trial_system_attribute_id]

It seems the serialized version of the CMA class is too large to fit in the 2048 character column. This code is the culprit:

optimizer_str = pickle.dumps(optimizer).hex()
study._storage.set_trial_system_attr(trial._trial_id, "cma:optimizer", optimizer_str)

https://gitter.im/optuna/optuna?at=5f484dea36e6f709fd08a5fa

I checked the length of serialized CMA object.

import pickle

import numpy as np
from cmaes import CMA

def get_cma_object_size(dim: int) -> int:
    optimizer = CMA(mean=np.zeros(dim), sigma=1.3)
    optimizer_str = pickle.dumps(optimizer).hex()
    return len(optimizer_str)

if __name__ == '__main__':
    for i in range(2, 32, 2):
        print(f"dim={i} size={get_cma_object_size(i)}")
$ python examples/cma_size.py 
dim=2 size=2516
dim=4 size=2916
dim=6 size=3364
dim=8 size=3850
dim=10 size=4442
dim=12 size=5018
dim=14 size=5738
dim=16 size=6378
dim=18 size=7194
dim=20 size=8074
dim=22 size=8906
dim=24 size=9916
dim=26 size=10956
dim=28 size=12092
dim=30 size=13148

This result means that the bug is raised even on low-dimensional search space.

Additional context (optional)

I've ever tested on SQLite3 but never bumped into this bug. The reason why is:

(9) What is the maximum size of a VARCHAR in SQLite? SQLite does not enforce the length of a VARCHAR. You can declare a VARCHAR(10) and SQLite will be happy to store a 500-million character string there. And it will keep all 500-million characters intact. Your content is never truncated. SQLite understands the column type of "VARCHAR(N)" to be the same as "TEXT", regardless of the value of N. https://www.sqlite.org/faq.html#:~:text=(9)%20What%20is%20the%20maximum,all%20500%2Dmillion%20characters%20intact.

c-bata commented 4 years ago

I opened a pull request to fix this bug. https://github.com/optuna/optuna/pull/1776

github-actions[bot] commented 4 years ago

This issue has not seen any recent activity.