Closed renesax14 closed 4 years ago
You should add a dataset_transform
(e.g. ClassSplitter
) to get a train
and test
dataset in batch
. You can use torchmeta.toy.helpers.sinusoid
, which comes with a default dataset_transform
.
I cannot reproduce the tensor having size 16, I get tensors of size (5, 5, 1)
as expected. Here is the modified script
from torchmeta.toy.helpers import sinusoid
from torchmeta.utils.data import BatchMetaDataLoader
dataset = sinusoid(shots=5, test_shots=5)
dataloader = BatchMetaDataLoader(dataset, batch_size=5, num_workers=4)
print(f'len(dataset) = {len(dataset)}') # len(dataset) = 1000000
print(f'len(dataloader) = {len(dataloader)}') # len(dataloader) = 200000
for batch in dataloader:
train_inputs, train_targets = batch["train"]
print(f'train_inputs.shape = {train_inputs.shape}') # train_inputs.shape = torch.Size([5, 5, 1])
print(f'train_targets.shape = {train_targets.shape}') # train_targets.shape = torch.Size([5, 5, 1])
break
You should add a
dataset_transform
(e.g.ClassSplitter
) to get atrain
andtest
dataset inbatch
. You can usetorchmeta.toy.helpers.sinusoid
, which comes with a defaultdataset_transform
.I cannot reproduce the tensor having size 16, I get tensors of size
(5, 5, 1)
as expected. Here is the modified scriptfrom torchmeta.toy.helpers import sinusoid from torchmeta.utils.data import BatchMetaDataLoader dataset = sinusoid(shots=5, test_shots=5) dataloader = BatchMetaDataLoader(dataset, batch_size=5, num_workers=4) print(f'len(dataset) = {len(dataset)}') # len(dataset) = 1000000 print(f'len(dataloader) = {len(dataloader)}') # len(dataloader) = 200000 for batch in dataloader: train_inputs, train_targets = batch["train"] print(f'train_inputs.shape = {train_inputs.shape}') # train_inputs.shape = torch.Size([5, 5, 1]) print(f'train_targets.shape = {train_targets.shape}') # train_targets.shape = torch.Size([5, 5, 1]) break
Thanks! :D
How does the ClassSplitter know to not form data-sets/tasks that are N-way, K-shot in the regression case? i.e. how does it guarantee that it only gets 1 function for each data-set/task D_i?
(btw the 16 was a bug on my end with jupyter remembering old stuff)
You should add a
dataset_transform
(e.g.ClassSplitter
) to get atrain
andtest
dataset inbatch
. You can usetorchmeta.toy.helpers.sinusoid
, which comes with a defaultdataset_transform
.
I find this comment confusing. I see in the helper that the Dataset.Sinusoid
is passed to the ClassSplitter
and not as an argument to dataset_transform
(nor any of the other options that are set to none transform=None, target_transform=None, dataset_transform=None
).
Is that what you meant or was that a typo since the helper never passes dataset_transform
to the Sinusoid task.
You should add a
dataset_transform
(e.g.ClassSplitter
) to get atrain
andtest
dataset inbatch
. You can usetorchmeta.toy.helpers.sinusoid
, which comes with a defaultdataset_transform
.I find this comment confusing. I see in the helper that the
Dataset.Sinusoid
is passed to theClassSplitter
and not as an argument todataset_transform
(nor any of the other options that are set to nonetransform=None, target_transform=None, dataset_transform=None
).Is that what you meant or was that a typo since the helper never passes
dataset_transform
to the Sinusoid task.
I think I made progress understand ur code.ClassSplitter
is to get the train and test set while the actual data-set/task (wether per function or per N-way,K-shot) has already been made by the actual meta-set class.
My doubt about the fact of dataloader remains though.
Data transforms (like ClassSplitter
) can either be used as a data_transform
, or as a wrapper (the wrapper is here just as syntactic sugar). The following two are equivalent
ClassSplitter
as a dataset_transform
argument
from torchmeta.toy import Sinusoid
from torchmeta.transforms import ClassSplitter
dataset = Sinusoid(num_samples_per_task=15, dataset_transform=ClassSplitter(num_train_per_class=5, num_test_per_class=10))
task = dataset.sample_task() print(task) # OrderedDict([('train', <torchmeta.utils.data.task.SubsetTask object at 0x11ba07dd8>), ('test', <torchmeta.utils.data.task.SubsetTask object at 0x11ba10240>)])
- `ClassSplitter` as a wrapper
```python
from torchmeta.toy import Sinusoid
from torchmeta.transforms import ClassSplitter
dataset = Sinusoid(num_samples_per_task=15)
dataset = ClassSplitter(dataset, num_train_per_class=5, num_test_per_class=10)
task = dataset.sample_task()
print(task) # OrderedDict([('train', <torchmeta.utils.data.task.SubsetTask object at 0x12078eda0>), ('test', <torchmeta.utils.data.task.SubsetTask object at 0x120797208>)])
Data transforms (like
ClassSplitter
) can either be used as adata_transform
, or as a wrapper (the wrapper is here just as syntactic sugar). The following two are equivalent
ClassSplitter
as adataset_transform
argumentfrom torchmeta.toy import Sinusoid from torchmeta.transforms import ClassSplitter dataset = Sinusoid(num_samples_per_task=15, dataset_transform=ClassSplitter(num_train_per_class=5, num_test_per_class=10)) task = dataset.sample_task() print(task) # OrderedDict([('train', <torchmeta.utils.data.task.SubsetTask object at 0x11ba07dd8>), ('test', <torchmeta.utils.data.task.SubsetTask object at 0x11ba10240>)])
ClassSplitter
as a wrapperfrom torchmeta.toy import Sinusoid from torchmeta.transforms import ClassSplitter dataset = Sinusoid(num_samples_per_task=15) dataset = ClassSplitter(dataset, num_train_per_class=5, num_test_per_class=10) task = dataset.sample_task() print(task) # OrderedDict([('train', <torchmeta.utils.data.task.SubsetTask object at 0x12078eda0>), ('test', <torchmeta.utils.data.task.SubsetTask object at 0x120797208>)])
Quick clarification, is the input to sinusioid num_samples_per_task
the same as the 600
images used for mini-imagenet per class label? e.g. does num_samples_per_task
get split by class splitter by the usual 5+15
support to query set sizes?
MiniImagenet
does not have a num_samples_per_task
argument (this is specific to toy regression datasets). But you can indeed see this as being similar to the 600 images per class: it corresponds to the number of possible examples to sample from for this task. In the case of toy regression tasks, this is simply the number of support + number of query examples (5 + 10
here).
MiniImagenet
does not have anum_samples_per_task
argument (this is specific to toy regression datasets). But you can indeed see this as being similar to the 600 images per class: it corresponds to the number of possible examples to sample from for this task. In the case of toy regression tasks, this is simply the number of support + number of query examples (5 + 10
here).
if I put 600 in the num_samples_per_task
and the class splitter 5+15
I can sample more than 20 points I hope?
I don't understand what you mean. Sinusoid
generates samples (there is not a pool of samples/images to sample from, as opposed to datasets like MiniImagenet
), so num_samples_per_task
specifies the number of samples to generate per task. If you have 5 samples in your training set and 15 in the test set of your task, then you only need to generate 5 + 15
samples for this task. If you need more samples for the training/test set of the task (e.g. you have a larger number of shots), then you can specify a larger num_samples_per_task
.
dataset = sinusoid(shots=5, test_shots=5) dataloader = BatchMetaDataLoader(dataset, batch_size=5, num_workers=4)
just to make this example complete, I believe you need to create another data loader from scratch to create meta-test and meta-val data loader. Since params are generated from scratch for each I believe they'd be disjoint and then you'd have a proper evaluations sets to evaluate your meta-learning algorithm.
see: https://github.com/tristandeleu/pytorch-meta/blob/master/torchmeta/toy/sinusoid.py
there is no explicit instructions on how to do that from the docs: https://tristandeleu.github.io/pytorch-meta/api_reference/toy/ so I assume what I said above is correct based on the code I read.
is this correct tristand? @tristandeleu
You should add a
dataset_transform
(e.g.ClassSplitter
) to get atrain
andtest
dataset inbatch
. You can usetorchmeta.toy.helpers.sinusoid
, which comes with a defaultdataset_transform
.I cannot reproduce the tensor having size 16, I get tensors of size
(5, 5, 1)
as expected. Here is the modified scriptfrom torchmeta.toy.helpers import sinusoid from torchmeta.utils.data import BatchMetaDataLoader dataset = sinusoid(shots=5, test_shots=5) dataloader = BatchMetaDataLoader(dataset, batch_size=5, num_workers=4) print(f'len(dataset) = {len(dataset)}') # len(dataset) = 1000000 print(f'len(dataloader) = {len(dataloader)}') # len(dataloader) = 200000 for batch in dataloader: train_inputs, train_targets = batch["train"] print(f'train_inputs.shape = {train_inputs.shape}') # train_inputs.shape = torch.Size([5, 5, 1]) print(f'train_targets.shape = {train_targets.shape}') # train_targets.shape = torch.Size([5, 5, 1]) break
FYI it seems you need this:
spt_x, spt_y, qry_x, qry_y = spt_x.float(), spt_y.float(), qry_x.float(), qry_y.float()
I tried putting it in the dataloader but couldn't do it nicely without getting lambda function pickle error or other errors
args.criterion = nn.MSELoss()
# tran = transforms.Compose([torch.tensor])
# dataset = sinusoid(shots=args.k_eval, test_shots=args.k_shots, transform=tran)
dataset = sinusoid(shots=args.k_eval, test_shots=args.k_shots)
meta_train_dataloader = BatchMetaDataLoader(dataset, batch_size=args.meta_batch_size_train, num_workers=args.num_workers)
meta_val_dataloader = BatchMetaDataLoader(dataset, batch_size=args.meta_batch_size_eval, num_workers=args.num_workers)
meta_test_dataloader = BatchMetaDataLoader(dataset, batch_size=args.meta_batch_size_eval, num_workers=args.num_workers)
just to make this example complete, I believe you need to create another data loader from scratch to create meta-test and meta-val data loader. Since params are generated from scratch for each I believe they'd be disjoint and then you'd have a proper evaluations sets to evaluate your meta-learning algorithm.
The parameters for generating the tasks are fixed (amplitude sampled uniformly in U(0.1, 5)
, phase sampled uniformly in U(0, 2\pi)
), so these are proper sets for evaluation. The meta-validation/meta-test sets will contain tasks which come from this same distribution over tasks.
FYI it seems you need this:
spt_x, spt_y, qry_x, qry_y = spt_x.float(), spt_y.float(), qry_x.float(), qry_y.float()
If I understand correctly, and based on the snippet in https://github.com/tristandeleu/pytorch-meta/issues/74#issuecomment-656905769 this is
spt_x, spt_y = batch['train']
qry_x, qry_y = batch['test']
I was trying to use the toy data-sets but when I got errors like
train
doesn't exist when trying to loop through the batches. Can we have a tiny minimal example to loop through the data for toy data sets?My attempt
other weird things was like the tensors being of size 16 but my meta-batch size being of size 5...