google / vizier

Python-based research interface for blackbox and hyperparameter optimization, based on the internal Google Vizier Service.
https://oss-vizier.readthedocs.io
Apache License 2.0
1.48k stars 96 forks source link

SQLite backend doesn't commit to the .db file #1187

Closed mtcairneyleeming closed 1 week ago

mtcairneyleeming commented 2 weeks ago

Hi,

It seems to me that the SQLite implementation of the datastore never commits any of the queries it makes to the database? I believe that it (implicitly) starts a transaction when it calls self._engine.connect(), based on the "Commit as you go" section of the sqlalchemy docs.

https://github.com/google/vizier/blob/f083e4e1a6230fc38afea1ae723f2eac14332893/vizier/_src/service/sql_datastore.py#L45

The (implicit) transaction is never committed, so all the changes made are only written to the SQLite journal file, and never to the .db file. I fixed this for my code by adding self._connection.commit() after most of the .execute calls, but I've not read the code in enough detail to know whether they were all in the right place.

I wrote a brief reproducible example on colab, here: https://colab.research.google.com/drive/1oLy6b57wbOBaq6aFh9c4QndBX0J1ixVQ?usp=sharing

(or here, as well:)

# Installation
!pip install google-vizier[jax]

from vizier.service import pyvizier as vz

problem = vz.ProblemStatement()
problem.search_space.root.add_float_param('chocolate', 0.0, 1.0)
problem.metric_information.append(
    vz.MetricInformation(
        name='taste', goal=vz.ObjectiveMetricGoal.MAXIMIZE))

def evaluate(chocolate: float) -> float:
    return 1 - 2 * (chocolate - 0.3)**2

from vizier.service import clients, servers

server = servers.DefaultVizierServer(
        database_url=f"sqlite:///results.db"
    )  
clients.environment_variables.server_endpoint = server.endpoint

study_config = vz.StudyConfig.from_problem(problem)
study_config.algorithm = 'GAUSSIAN_PROCESS_BANDIT'
study_client = clients.Study.from_study_config(study_config, owner='my_name', study_id='cookie_recipe')

for _ in range(2):
  suggestions = study_client.suggest(count=1)
  for suggestion in suggestions:
    chocolate = suggestion.parameters['chocolate']
    obj = evaluate(chocolate)
    print(f'Iteration {suggestion.id}, suggestion (chocolate={chocolate:.3f}) led to taste value {obj:.3f}.')
    final_measurement = vz.Measurement({'taste': obj})
    suggestion.complete(final_measurement)
print("Optimal trials")
optimal_trials = list(study_client.optimal_trials())
for t in optimal_trials:
    optimal_trial = t.materialize()
    print(optimal_trial)

!ls -la
# Expected: results.db, no results.db-journal file (everything should have been committed)

# first way I could think of to get the sqlite cli too
!wget https://www.sqlite.org/2024/sqlite-tools-linux-x64-3470000.zip
!unzip sqlite-tools-linux-x64-3470000.zip

!./sqlite3 results.db "SELECT * from trials;"
# Expected - print out trials that we ran above
# Actual: nothing prints

!./sqlite3 results.db ".dump"
# Expected: should print out entire results of optimisation (incl trials)
# Actual: only prints table creation (though it does demonstrate that we can load the .db file at all from the cli.)
xingyousong commented 2 weeks ago

Added fix - can you try now with google-vizier[jax]==0.1.20?

Here are the fixes: https://github.com/google/vizier/commit/64b022042505ba2ef09f097ff965d679f6462f79#diff-68b9f2babfb0c6b142af11017c28772dfa34d54ce13c7928cc57006afcc798af

mtcairneyleeming commented 1 week ago

Yeah, that works great!

Thanks for the really quick reply!