optimizedlearning / mechanic

MIT License
34 stars 3 forks source link

Issue loading mechanic optimizer state #2

Open yitzhaklevi opened 10 months ago

yitzhaklevi commented 10 months ago

The following simple script reproduces the issue:

model = torch.nn.Linear(1, 1)
optimizer = mechanize(torch.optim.AdamW)(model.parameters(), lr=1e-5)
x = torch.ones([5, 1])
out = torch.sum(model(x))
out.backward()
optimizer.step()
print('done first step')

new_optimizer = mechanize(torch.optim.AdamW)(model.parameters(), lr=1e-5)
new_optimizer.load_state_dict(optimizer.state_dict())
out = torch.sum(model(x))
out.backward()
new_optimizer.step()
print('done new steps using new optimizer loaded')
yitzhaklevi commented 10 months ago

The issue is due to the fact that state_dict['state']['_mechanic'] has tensor pointers as keys, those does not have the same addresses when re-initializing new Mechanic optimizer

(other optimizers e.g AdamW has the indexes as keys for the state)

acutkosky commented 10 months ago

Thanks for bringing this up! I'll take a look and make an update (or if you want to do so, please feel free to submit a PR).

yitzhaklevi commented 10 months ago

Welcome, I actually fixed that (will submit a PR next week, ) but it seems that on Windows the issue does not reproduce. (tried on my local machine and it worked)

ogencoglu commented 1 month ago

Any update on this?