Closed gcampax closed 3 years ago
Totals | |
---|---|
Change from base Build 1287: | 0% |
Covered Lines: | 1200 |
Relevant Lines: | 2098 |
Mh, with the hyperparameters in this PR the dataset is too big (~1.3M after augmentation) and OOMs in training. I'm going to try a smaller set of hyperparameters to see if it helps, but we might have to scrap this idea.
I tested the tuned hyperparameters with build 131, the dataset size looks good, the synthesis is very fast, and the final performance is good. I'm going to merge this.
To bias the training data towards simple commands, do two separate generations, with different sets of hyperparameters.
"Simple" generation is optimized for breadth: high pruning size so we don't prune out commands, low depth and number of turns, and compound commands disabled.
"Complex" generation is as before optimized for depth and complexity: relatively low pruning size but high depth and number of turns.
I'm going to try this in Kubeflow to see how it fares, but comments welcome.