[New Rule Request] Torchfix should issue a warning when a torch.nn.Module stores its layers in a python list instead of a torch.nn.ModuleList

ssgosh commented 6 months ago

Inside a model definition, the torch.nn.Module objects inside a Python list do not get their parameters registered. Hence such parameters do not get trained by the optimizer, even though they are in the call graph formed by forward(). This should be flagged by torchfix -- currently no warning is given for this issue.

Example:

class FeedForward(torch.nn.Module):
    def __init__(self, n_features, n_classes, n_hidden, width):
        super().__init__()

        # Ideally, torchfix should issue a warning on below code
        # The parameters of the hidden layers do not get registered if they are in a list, and are not optimized!
        self.hidden_layers = [torch.nn.Linear(n_features if i ==0 else width,  width, bias=True) for i in range(n_hidden)]

        # Correct version of the above code -- use ModuleList([]) instead of python list []
        self.hidden_layers = torch.nn.ModuleList([torch.nn.Linear(n_features if i ==0 else width,  width, bias=True) for i in range(n_hidden)])

        # Dummy call to torch.solve() to throw a torchfix warning (to demonstrate that torchfix is working correctly)
        torch.solve()

Torchfix output:

$ torchfix --select=ALL ./supervised/nn/feed_forward_nn.py 
supervised/nn/feed_forward_nn.py:20:9: TOR001 Use of removed function torch.solve: https://github.com/pytorch-labs/torchfix#torchsolve
Finished checking 1 files.

kit1980 commented 6 months ago

Sounds like a good idea. I personally helped people who had this issue.

kit1980 commented 6 months ago

This seems a bit tricky to implement. Currently TorchFix doesn't know the types of the objects, so it's hard to find lists of torch.nn.Module objects.

pyre and TypeInferenceProvider https://libcst.readthedocs.io/en/latest/metadata.html#libcst.metadata.TypeInferenceProvider can probably help here, but it's a separate feature to implement.

ssgosh commented 6 months ago

Yikes! Perhaps it can be done on a best-effort basis for some commonly-used class types, such as Linear, Conv2d and other subclasses of torch.nn.Module as found here: https://pytorch.org/docs/stable/nn.html ? Maybe it can be done only for list comprehensions? I would imagine that it's a common idiom that many people use.

sbrugman commented 3 weeks ago

I'll contribute this rule. Got it working locally, just waiting for the open PRs to be reviewed/merged.

There is a real-world example in transformers (impact mitigated by the subsequent add_module calls). Other than that, the violation of this rule is fairly rare in larger projects, but moderately common in smaller repos (10+ examples)

pytorch-labs / torchfix

[New Rule Request] Torchfix should issue a warning when a torch.nn.Module stores its layers in a python list instead of a torch.nn.ModuleList #31