Question: Support for non-straight network shapes?

progirep commented 6 years ago

Dear LocusLab members,

how difficult would it to add support for network shapes that are not a simple chaint? There are a few applications in which they make sense (at least for experimentation). I'm pretty sure that your adversarial learning method would be able to support this, but from a software engineering point of view, adding this feature looks non-trivial as your code structure does not seem to be made for supporting nestings of "nn.Sequential" layers.

As an example, I've uploaded a corresponding Jupyter Notebook to: https://github.com/progirep/convex_adversarial/blob/master/examples/SplitExample.ipynb

The example network splits up into two separate parts, which are then joined again. I've added a "SplitNetwork" class to my fork of your project.

Such network architectures could make sense for applications in which you expect many dependencies between some groups of input variables, but limited interaction between these groups for the overall classification.

riceric22 commented 6 years ago

Hey there! Right now, we have support for arbitrary computational graphs via Dense layers, which take in as input all previous layers as input. This technically allows for any sort of computational graph, and we used it to implement residual networks. While this does mean it can support something like the split network example, it's not the cleanest way to approach that. So your notebook example could be implemented as follows (there might be a fix necessary to allow dense networks to operate directly on the input, which I haven't tested before; I believe there's currently a check to make sure Dense networks only refer back to other Dense layers, which isn't strictly necessary):

netsplit = nn.DenseSequential(
    # first split
    nn.Linear(1,20), 
    nn.ReLU(), 
    nn.Linear(20,5), 
    nn.ReLU(), 
    # second split, skipping the previous 4 layers
    Dense(nn.Linear(1,3), None, None, None, None)
    nn.ReLU(), 
    nn.Linear(3,5), 
    nn.ReLU(), 
    # combine the two split networks 
    Dense(nn.Linear(5,10), None, None, None, nn.Linear(5,10)), 
    nn.ReLU(), 
    nn.Linear(10,10), 
    nn.ReLU(), 
    nn.Linear(10,20)
)

However, due to the modularity of our most recent approach, I believe it would be fairly straightforward to implement the split network with one change: modifying the DualNetwork class to conform to the DualLayer signature, which would allow us to call it recursively.

Then, the example in your notebook would work like the following:

netsplit = nn.DenseSequential(
    # first split
    nn.Sequential(
        nn.Linear(1,20), 
        nn.ReLU(), 
        nn.Linear(20,5), 
        nn.ReLU(), 
    ), 
    # second split, skipping the previous output
    Dense(
        nn.Sequential(
            nn.Linear(1,3), 
            nn.ReLU(), 
            nn.Linear(3,5), 
            nn.ReLU(), 
        ), 
        None
    )
    # combine the two split networks 
    Dense(nn.Linear(5,10), nn.Linear(5,10)), 
    nn.ReLU(), 
    nn.Linear(10,10), 
    nn.ReLU(), 
    nn.Linear(10,20)
)

which I think is the better approach. This modification I think actually makes a lot of sense: a dual network should really just be another dual layer, there's no reason for it to be a different class since it ends up implementing the same functionality, so I'll include it in the current refactor.

EDIT: second example needed a Dense layer to skip the previous network

progirep commented 6 years ago

Hi @riceric22,

Thanks a lot for your help first of all! I've tried both your first and your second proposed approach, but couldn't get any of them to run (yet).

I've uploaded a Jupyter notebook showing the exceptions of using the second approach to:

https://github.com/progirep/convex_adversarial/blob/5628b9be979a1df81789017fac21e0b2813eb25d/examples/SplitExample.ipynb

The network learns well for the non-robust version, but then aborts adversarial learning with the error message "TypeError: object of type 'NoneType' has no len()" in the transpose_all function.

The code also refers to the "toAffineTranspose" function, which seems to have been thrown out during your refactoring.

Thanks!

riceric22 commented 6 years ago

Hey @progirep ,

So I realized the problem you had was slightly different: it was not just having two different paths from the input, but the input itself was also split in half. It's not the prettiest solution, but you can achieve the split using existing functionality by using fixed linear operators on the input.

I've pushed an update to master branch which fixes the error you came up with, see e747af7e4e4be4c0b6ea52fce7e43dd6683f1f48. In order to make the split network example work, I also had to add some None cases for when layers are non-applicable (e.g. applying a layer in one split to the other split is meaningless, and is now indicated by returning None; this happens when a Dense layer's non-None operations are applied to only None inputs).

The split example now works on master branch, using the following:

eye = torch.eye(2)
split1 = nn.Linear(2,1, bias=False)
split1.weight = nn.Parameter(eye[:1])
split1.weight.requires_grad = False
split2 = nn.Linear(2,1, bias=False)
split2.weight = nn.Parameter(eye[1:])
split2.weight.requires_grad = False

netSplit = convex_adversarial.DenseSequential(
    # add a no-op mapping for Dense support, since Dense operators don't currently work directly on input layers
    Dense(nn.Sequential()),
    # split the input into two different layers
    Dense(split1), 
    Dense(split2, None),
    # operate on split2
    nn.Linear(1,20), 
    nn.ReLU(), 
    nn.Linear(20,5), 
    nn.ReLU(), 
    # operate on split1, skipping the previous 5 layers
    Dense(nn.Linear(1,3), None, None, None, None, None),
    nn.ReLU(), 
    nn.Linear(3,5), 
    nn.ReLU(), 
    # combine the two split networks 
    Dense(nn.Linear(5,10), None, None, None, nn.Linear(5,10)), 
    nn.ReLU(), 
    nn.Linear(10,10), 
    nn.ReLU(), 
    nn.Linear(10,20)
)

opt = optim.Adam(filter(lambda p: p.requires_grad,netSplit.parameters()), lr=1e-3)

This should work for now until I implement the recursive case for the sequential operator.

riceric22 commented 6 years ago

Whoops, accidentally closed; I'm going to leave this open until the recursive case is implemented

progirep commented 6 years ago

Hi Eric. Again, thanks for your support. I will try out your latest version very soon.

progirep commented 6 years ago

Hi Eric. Your example worked very well! I am still a bit unsure if I am using the dense layers correctly, as I frequently obtain errors such as "AttributeError: 'int' object has no attribute 'dim'" in dual_layers.py

I've stripped all details from my non-working examples to one in which only linear layers are used, which you can find at:

https://github.com/progirep/convex_adversarial/blob/9667bb522be78a5bc85c37a1fd1ff09aa8ade560/examples/DenseExample.ipynb

The architecture learns correctly in the non-robust case, but the notebook shows the runtime error for the robust case.

The example does not make sense since it only consists of linear layers, but it examplifies the problems I have when trying to modify your example.

Thanks!

riceric22 commented 6 years ago

Thanks for the detailed examples! There was a bug in the Dense layer which would return 0 for a forward pass when there are no applicable connections when it should have returned None. I've patched this in 797ddbbee26acefd0bce98cbc7688422da070481 on the master branch and the network in your example runs properly now.

locuslab / convex_adversarial

Question: Support for non-straight network shapes? #12