Closed sck-at-ucy closed 8 months ago
For both you should not compile the outer training loop which itself contains mx.eval
. E.g. this:
@partial(mx.compile)
def train_and_validate(train_data, validation_data, batch_size, epochs, alpha, dx, dy, dt, ny, nx):
should not be compiled. You cannot evaluate the graph inside a compiled function so that is almost always going to crash.
If you remove that compiling the train_step
and eval_step
should work as you have them, assuming you do not do any evals inside those functions (e.g. by casting to Numpy or calling `mx.eval)
Indeed, this was the problem, thank you!
I have two slightly different implementations of model training with compile in my code, one works the other fails and I do not understand the cause. Data_loader is a function that selects datasets for training and validation from precomputed solutions. It includes an option for random shuffling but I turned it off and made no difference,
1. Successful Implementation
2. Failing Implementation
In the second implementation that fails, the value_and_grad can is moved into a train_step() function but is otherwise the same.
This code with separate
train_step
fails withIndexError: unordered_map::at: key not found