Open freestylerick opened 1 year ago
I agree with your point, especially in some cases. However, this is an issue of coding style preference. Please close the issue.
To be clear - I am asking if this is a bug, not simply a question of style. Are you sure it's not a bug?
I wondered this also. The documentation for to
(https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.to) has a note that says "This method modifies the module in-place" so I believe this bug doesn't impact the behavior of the code.
Thanks - I think this would benefit from a unit test.
I am asking if this is a bug, not simply a question of style. Are you sure it's not a bug?
You could verify that it is not really a bug (but a bit sloppy) by something like this:
In [1]: import torch
In [2]: model = torch.nn.Linear(2, 1)
In [3]: model.weight.device
Out[3]: device(type='cpu')
In [4]: m = model.to("cuda")
In [5]: model.weight.device
Out[5]: device(type='cuda', index=0)
In [6]: id(model)
Out[6]: 140022542389776
In [7]: id(m)
Out[7]: 140022542389776
In [8]: model is m
Out[8]: True
In a perfect world, I'd also like to test that a change on m also propagates to model (and/or vise versa). If someone does this, then I will agree that they are very likely the same.
I'm curious if gpt.py is buggy (which is my guess and gpt-4's guess as well https://chat.openai.com/share/b50316a4-0f63-4813-8888-9cb3ca68b7f1) or why it's not.
On line 199 of gpt.py we define
m
. We then usem
andmodel
when I think we should be using only one of them (I think 199 should bemodel = ...
and then we only usemodel
. There might be some spots where we have to move tensors so everything is on the same device).