pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.44k stars 3.68k forks source link

Data augmentation function #4980

Open dokato opened 2 years ago

dokato commented 2 years ago

🚀 The feature, motivation and pitch

Let's say that I create a new dataset ina for loop, where apart from my true examples, I want to add Naug augmented examples after some transformations (i.e. jitter, roation etc.). I tried to do it in the following way:

transformation = T.RandomJitter(0.05)

for i in range(N):
    example = Data(x=x_, pos=pos_, y=y_)
    tr_dataset.append(example)
    for i in range(Naug):
        transformation(example)
        tr_dataset.append(example)
    break

But looks like the list contains pointers to the same example Data objects. I haven't found any easy way to add Data augmentation with torch_geometric (unless I'm mistaken?).

Alternatives

What would be handy, I think, is a data_augementer function/object, where one would get a copy of a Data object with transformations applied.

Additional context

I know that I could do it with .clone() but it looks like a workaround rather than intuitive suntax to what I need. i.e.

rusty1s commented 2 years ago

Yes, transformations do not apply a copy. I suggest to simply do

import copy

for i in range(Naug):
    dataset.append(transform(copy.copy(data))
dokato commented 2 years ago

Thanks, this indeed helps, or as I said I can clone it, but maybe such a feature in pytorch_geometric would be nice.

rusty1s commented 2 years ago

I think copy.copy here is better since there is no need to clone untouched tensors which are safe to share.