v0.2 Save trials object to resume on next run

methenol commented 6 years ago

It's going to be a few days before I can test this, but if anyone wants give this a shot and report back it would be greatly appreciated. As long as there isn't anything heavily nested being passed back in trials, this should work. Tested with hyperas but should work with hyperopt. Saving the pickle to a local file is a temporary solution until it can be pushed to a SQL table and pulled back down.

Somewhere at the top of hypersearch add: import pickle

Then replace lines 344-346 with this:

    # set initial max_eval, attempt to load a saved trials object from pickle, if that fails start fresh.
    # grab how many trials were previously run and add max_evals to it for the next run.
    # this allows the hyper parameter search to resume where it left off last.
    # TODO save trials to SQL table and restore from there instead of local pickle. 
    max_evals = 20
    try:
        trialPickle = open('./trial.pickle','rb')
        trials = pickle.load(trialPickle)
        max_evals = len(trials.trials) + max_evals
    except:
        trials = Trials()

    best = fmin(loss_fn, space=space, algo=tpe.suggest, max_evals=max_evals, trials=trials)

    with open('./trial.pickle', 'wb') as f:
            pickle.dump(trials, f)

Hyperopt seems to support saving this data to mongoDB, however, we can probably get it to a json friendly format and keep the data in an sql table similar to the runs https://github.com/hyperopt/hyperopt/wiki/Parallelizing-Evaluations-During-Search-via-MongoDB

methenol commented 6 years ago

For initial testing: Instead of waiting for 20 runs to tell if this works, if you're testing it set max_evals = 5 or even max_evals = 1 to make sure that it loads and writes the pickle correctly. After that can set it back to 20 if there are no issues pickling the trials object. To reset your trials, simply delete the trial.pickle file then the next run will start fresh. If you want to start over but still have the ability to restore it back later, just back up the trial.pickle file to a different directory. If you add any additional hypers to search, delete the trials.pickle file or you'll get a key error when hyperopt attempts to resume trials.

Reminder, this is for the hyperopt implementation in v0.2 and does not apply to v0.1

A note on the hyperopt trials: If there is a hyper combo that causes the run to crash (example is a network depth that is too high for the amount of ram that you have resulting in the process getting killed), when hypersearch crashes it will NOT write the trials to pickle since it only saves at the end of max_evals. When the hypersearch is run again, it's very likely that the hyper combo that caused your process to get killed will be tried again at some point. I haven't run v0.2 yet, this is just from experience with Keras and sequential regression where a very particular combination of parameters returned loss as nan that fmin could not parse. We shouldn't have the nan problem since we are maximizing returns over holding (from what it looks like in v0.2) which is a little less finicky than a loss function.

methenol commented 6 years ago

I've been able to test this portion today and can report that pickling the trials is working with the code in the first post. Once the max_evals is reached, it saves the trials to disk. Next time you start hypersearch it resumes where it left off last. It takes a significant amount of runs for hyperopt to really get to refining things. It's effective but computationally very expensive. With how many hypers are realistically needed for RL, and how long it takes per run, saving the progress is a must. With 100 runs it's probably still throwing darts at a map, just standing a little closer.

@lefnire I'd like to get this SQL ready so it can fall in line with the rest of the framework but am not real SQL savvy, going to take me a bit to hack through that and there are some more pressing issues. It's your call if you want this in master as-is or hold until it's compatible with running multiple hypersearches in parallel. I'll get the fork ready and submit a PR with a reference to this issue to save some time if you decide to go with it.

lefnire / tforce_btc_trader

v0.2 Save trials object to resume on next run #39