skorch-dev / skorch

A scikit-learn compatible neural network library that wraps PyTorch
BSD 3-Clause "New" or "Revised" License
5.89k stars 391 forks source link

Dictionary Input and Custom Collate Function #1050

Open meepd opened 9 months ago

meepd commented 9 months ago

It's not clear to me how to combine dictionary input (which is the suggested solution for multiple input models, and RNNs) with a custom collate function for by-batch padding of sequences. I know by default the dictionary is unpacked when passing into the forward function, but I can't imagine that's true for the collate_fn. So does it assume that the collate_fn takes in also (X,y) as a tuple, where X is a SliceDict? How do we unpack that SliceDict?

Also, I'm not sure how to actually use SliceDict for variable sequence data. It doesn't seem to accept a list of tensors.

I would prefer to use a Dataset I defined, but not clear how to handle multiple inputs in that case.

BenjaminBossan commented 9 months ago

but I can't imagine that's true for the collate_fn

The default collate_fn from PyTorch correctly deals with dictionary inputs. If you want to pass a custom collate_fn, you'd have to ensure that it does too. Ideally, you could share your code and the error so that we can take a look.

So does it assume that the collate_fn takes in also (X,y) as a tuple, where X is a SliceDict? How do we unpack that SliceDict?

Just to be sure, SliceDict is not involved when using dictionary inputs. Its main purpose is basically to trick sklearn into accepting dictionary inputs, e.g. when you want to use GridSearchCV and pass a dict as X.

I would prefer to use a Dataset I defined, but not clear how to handle multiple inputs in that case.

Again, if you could provide some code and (dummy) data, it would help us to figure out your issue.