Closed snokh closed 3 years ago
There is no line 63 in finetuning.py
(see code), and there is no final_target
variable in the corresponding criterion()
function. I would need more insight into which modifications you have added to be able to guess where the error comes from.
The two lines and 'final_target' were added for debugging purposing only.
I have set up my virtual environment and clone the repo again, and the error still persists. Please find the attached JPG.
I just cloned the repository from scratch on a fresh machine and didn't get that error (output pasted below). So I´m not sure how to reproduce your error. Some possibilities that come to mind could be:
incremental?learning.py
. Check if the ´targets´ variable has a list of labels for the batch.../data\cifar100\cifar-100-python.tar.gz
, which uses both /
and \
. You can change the data path in dataset_config.py
, and the results path on main_incremental.py
with the --results-path
argument.targets
to be of the correct type by using targets.long()
?(base) mmasana@XXX:~/libraries$ git clone https://github.com/mmasana/FACIL.git
Cloning into 'FACIL'...
remote: Enumerating objects: 101, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 101 (delta 0), reused 0 (delta 0), pack-reused 98
Receiving objects: 100% (101/101), 7.62 MiB | 14.58 MiB/s, done.
Resolving deltas: 100% (29/29), done.
(base) mmasana@XXX:~/libraries$ cd FACIL/
(base) mmasana@XXX:~/libraries/FACIL$ ls
docs environment.yml LICENSE README.md requirements.txt scripts src
(base) mmasana@XXX:~/libraries/FACIL$ python3 -u src/main_incremental.py
=========================================================
Arguments =
approach: finetuning
batch_size: 64
clipping: 10000
datasets: ['cifar100']
eval_on_train: False
exp_name: None
fix_bn: False
gpu: 0
gridsearch_tasks: -1
keep_existing_head: False
last_layer_analysis: False
log: ['disk']
lr: 0.1
lr_factor: 3
lr_min: 0.0001
lr_patience: 5
momentum: 0.0
multi_softmax: False
nc_first_task: None
nepochs: 200
network: resnet32
no_cudnn_deterministic: False
num_tasks: 4
num_workers: 4
pin_memory: False
pretrained: False
results_path: ../results
save_models: False
seed: 0
stop_at_task: 0
use_valid_only: False
warmup_lr_factor: 1.0
warmup_nepochs: 0
weight_decay: 0.0
==========================================================
Approach arguments =
all_outputs: False
==========================================================
Exemplars dataset arguments =
exemplar_selection: random
num_exemplars: 0
num_exemplars_per_class: 0
==========================================================
Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to ../data/cifar100/cifar-100-python.tar.gz
100%|#########9| 168468480/169001437 [00:16<00:00, 6700375.69it/s]Extracting ../data/cifar100/cifar-100-python.tar.gz to ../data/cifar100
Files already downloaded and verified
[(0, 25), (1, 25), (2, 25), (3, 25)]
************************************************************************************************************
Task 0
************************************************************************************************************
| Epoch 1, time= 6.3s | Train: skip eval | Valid: time= 0.5s loss=2.688, TAw acc= 19.4% | *
_(program continues on until completion)_
I am getting the same error. All I did was:
python -u src/main_incremental.py --approach finetuning --network resnet34
I am running this on windows 10 powershell.
Output:
(LIGN_test) PS F:\dev\Projects\LIGN\.rug\FACIL> python -u src/main_incremental.py --approach finetuning --network resnet34
============================================================================================================
Arguments =
approach: finetuning
batch_size: 64
clipping: 10000
datasets: ['cifar100']
eval_on_train: False
exp_name: None
fix_bn: False
gpu: 0
gridsearch_tasks: -1
keep_existing_head: False
last_layer_analysis: False
log: ['disk']
lr: 0.1
lr_factor: 3
lr_min: 0.0001
lr_patience: 5
momentum: 0.0
multi_softmax: False
nc_first_task: None
nepochs: 200
network: resnet34
no_cudnn_deterministic: False
num_tasks: 4
num_workers: 4
pin_memory: False
pretrained: False
results_path: ../results
save_models: False
seed: 0
stop_at_task: 0
use_valid_only: False
warmup_lr_factor: 1.0
warmup_nepochs: 0
weight_decay: 0.0
============================================================================================================
Approach arguments =
all_outputs: False
============================================================================================================
Exemplars dataset arguments =
exemplar_selection: random
num_exemplars: 0
num_exemplars_per_class: 0
============================================================================================================
Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to ../data\cifar100\cifar-100-python.tar.gz
100.0%
Extracting ../data\cifar100\cifar-100-python.tar.gz to ../data\cifar100
Files already downloaded and verified
[(0, 25), (1, 25), (2, 25), (3, 25)]
************************************************************************************************************
Task 0
************************************************************************************************************
C:\Users\josue\anaconda3\envs\LIGN_test\lib\site-packages\torch\nn\functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ..\c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Traceback (most recent call last):
File "src/main_incremental.py", line 316, in <module>
main()
File "src/main_incremental.py", line 264, in main
appr.train(t, trn_loader[t], val_loader[t])
File "F:\dev\Projects\LIGN\.rug\FACIL\src\approach\incremental_learning.py", line 56, in train
self.train_loop(t, trn_loader, val_loader)
File "F:\dev\Projects\LIGN\.rug\FACIL\src\approach\finetuning.py", line 52, in train_loop
super().train_loop(t, trn_loader, val_loader)
File "F:\dev\Projects\LIGN\.rug\FACIL\src\approach\incremental_learning.py", line 111, in train_loop
self.train_epoch(t, trn_loader)
File "F:\dev\Projects\LIGN\.rug\FACIL\src\approach\incremental_learning.py", line 171, in train_epoch
loss = self.criterion(t, outputs, targets.to(self.device))
File "F:\dev\Projects\LIGN\.rug\FACIL\src\approach\finetuning.py", line 61, in criterion
return torch.nn.functional.cross_entropy(outputs[t], targets - self.model.task_offset[t])
File "C:\Users\josue\anaconda3\envs\LIGN_test\lib\site-packages\torch\nn\functional.py", line 2824, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #2 'target' in call to _thnn_nll_loss_forward
One difference I noticed between your output and mine is the following:
Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to ../data/cifar100/cifar-100-python.tar.gz
vs
Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to ../data\cifar100\cifar-100-python.tar.gz
File path format is different.
yes, that's what I commented earlier:
could this be related to your path system? I noticed that your downloading of CIFAR-100 goes to ../data\cifar100\cifar-100-python.tar.gz, which uses both / and . You can change the data path in dataset_config.py, and the results path on main_incremental.py with the --results-path argument.
Seems like it might be an issue with Windows? The path seems to be wrong. But also seems like the system doesn't complain, so maybe it loads an empty dataset? Which you could check with this:
the dataset is empty? It seems that the vector of targets contains a single value instead of the LongTensor that is expected by the CE loss. You could check that by setting a break point on line 170 of incremental_learning.py. Check if the ´targets´ variable has a list of labels for the batch.
Could you try it and let me know what you get? Also, --network resnet34
should be --network resnet32
if you use small input datasets such as CIFAR-100.
Got it working in Linux with no issues. I will check in windows in the future and let you know.
Btw I had a few questions about your implementations:
Feel free to let me know if you would like me to elaborate on any of the questions
Yeah, in Linux it should work fine. Answering your questions:
Are the exemplars defined as the number of data used to retrain the model during rehearsal?
Yes, the exemplars are the number of data/images that will be used during rehearsal.
Are the exemplar size enforced during initial training for the encountered labels?
I'm not sure if I understand the question. The exemplars are selected from the training data of that task at the end of its training session.
For fixed memory, will the number of data per class depend on how many labels have been encountered or all the labels that will ever be encountered?
It depends on the labels encountered. The framework's main comparison strength is to enforce that the incremental learning is done without knowledge or access to future tasks/labels, as in a realistic scenario setting. In the case of fixed memory, you have a buffer of X images that is updated after learning each task. Since it is fixed, as more classes are learned, less exemplars per class are available.
How do you set the initial number of classes and step size between tasks?
short answer: Since most scenarios divide the number of classes equally among tasks, that is the default setting.
longer answer: The arg --nc-first-task
allows to define a larger first task. And providing a list of datasets allows for each of them to be learned one after the other. If you would want another partition of a dataset, you could either define them as separate datasets of the desired length, or by modifying the corresponding dataset code. I recommend the first, since it can be defined entirely into the dataset_config.py
and making use of the class_order
entry.
Could you elaborate of the role of grid search?
GridSearch was the name we gave it at the beginning, and later we adapted to the Continual Hyperparameter Search defined in "Class-incremental learning: survey and performance evaluation on image classification" and in "A continual learning survey: Defying forgetting in classification tasks". We plan on changing the naming since I agree that it is confusing. In short, it allows to choose the main hyperparameter related to stability-plasticity (aka intransigence-forgetting) at each task without knowledge of future tasks.
How can one set the scenario in which for cifar100 we start with 50 classes, have a step size of 10 and there is either fixed or growing memory, or there is access to the full data set during rehearsal (i.e. retraining)?
You would use --datasets cifar100
with --nc-first-task 50 --num-tasks 6
(instead of steps you define the number of tasks 50-10-10-10-10-10). For fixed memory you would use --num-exemplars X
and for growing memory --num-exemplars-per-class X
. To have access to all data, you can check the joint.py baseline.
Thanks for your answers
I am successfully able to run your code on WSL2 on Windows 11. Thanks!
In case this is useful to anyone, I found the same error on windows and it was fixed by forcing the targets to be ".long()"
While executing the python3 -u src/main_incremental.py script, the code is giving the below error:-