Closed danielm1405 closed 10 months ago
Hi,
Thank you for being interested in our work!
For Question 1: I am providing this log file output.log from our previous experiments with NNCSL 0.8% labeled data and buffer size 500 on CIFAR100. Could you please compre your runs to verify if the model performs correctly during all continual stages? In parallel I will also try to reproduce the results.
For Question 2: I will also try to run as your setting to verify. I will keep you updated :-)
Given that the CVPR deadline is approaching, please understand that I won't be able to debug (if necessary) very quickly before that. Sorry for that!
Q1: Thanks for the log. This is a comparison of our runs. They start to differ significantly after task 3. | Task id | Your results | My results |
---|---|---|---|
1 | 76.1 | 76.4 | |
2 | 59.85 | 59.45 | |
3 | 51.8 | 48.7 | |
4 | 42.77 | 37.07 | |
5 | 37.38 | 33.72 | |
6 | 35.06 | 31.81 | |
7 | 32.75 | 30.17 | |
8 | 30.31 | 26.73 | |
9 | 28.15 | 25.51 | |
10 | 27.25 | 24.2 |
Q2: Thanks!
Do you use this exact repo to produce your results? Or do you have some internal version of the repo that may differ and produce different results? I suspect that you use different repo because this repo cannot be run out-of-the-box (because of missing import numpy as np
in src/utils.py
)
Hi,
Thanks for your feedback!
Ok I see, indeed it starts to differ from Task 3. And yes, we have an internal version that includes all our changes/variants. So there might be some inconsistency between the internal version and this public clean version. I am launching experiments from my side to verify.
But please be ensured that the results are reproducible :-)
Sorry for the bugs that still exist in the repo. I also noticed them and corrected them in my last commit.
Zhiqi
And how do you exactly control the fraction of labeled samples? With data.unlabeled_frac
parameter?
Hi,
Thanks for your feedback!
Ok I see, indeed it starts to differ from Task 3. And yes, we have an internal version that includes all our changes/variants. So there might be some inconsistency between the internal version and this public clean version. I am launching experiments from my side to verify.
But please be ensured that the results are reproducible :-)
Sorry for the bugs that still exist in the repo. I also noticed them and corrected them in my last commit.
Zhiqi
Hi! I am also not able to reproduce the result for the CIFAR10 at 0.8% buffer500. In the paper it is mentioned that the 73.2% but I am getting only 68.02% while running it. Could you please share the internal version.
Hi,
First of all, with my reproduction, I confirm that there must be something wrong with this version. I will work on this today, please stay tuned!
And how do you exactly control the fraction of labeled samples? With
data.unlabeled_frac
parameter?
No, it is based on the files in /subsets, where we hardcode the index for the selected labeled samples. And then we also need to source these files by parameters data.subset_path
and data.subset_path_cls
in the config file.
I noticed that I didn't upload the code for generating indexes for different proportions of labeled data. I will also update our code to make it easier to use.
Zhiqi
Hi, Thanks for your feedback! Ok I see, indeed it starts to differ from Task 3. And yes, we have an internal version that includes all our changes/variants. So there might be some inconsistency between the internal version and this public clean version. I am launching experiments from my side to verify. But please be ensured that the results are reproducible :-) Sorry for the bugs that still exist in the repo. I also noticed them and corrected them in my last commit. Zhiqi
Hi! I am also not able to reproduce the result for the CIFAR10 at 0.8% buffer500. In the paper it is mentioned that the 73.2% but I am getting only 68.02% while running it. Could you please share the internal version.
Hi!
As you might have seen from my discussion with @danielm1405 , there must be something wrong with this version. I will try to fix this today, please stay tuned! Sorry for the inconvenience.
@danielm1405 @rokmr
Hi,
A quick update: I have figured out why this version is underperforming. I am running experiments on my side to validate before uploading the changes, which might take some time. In the meantime, I have also updated the buffer to a standard reservoir to make it easier to use.
I will ping you once the new version is uploaded. Thanks for your patience!
Best, Zhiqi
Hey @danielm1405 @rokmr ,
I just updated the repository for better reproducibility.
I recommend you to activate the deterministic mode, by uncommenting these codes: https://github.com/kangzhiq/NNCSL/blob/a38078aaa911ee43f5f5b03998a53bb4399c33b6/src/nncsl_train.py#L75-L82
If you activate the deterministic mode, you would be 100% reproducing the exact results as my logs:
Besides, the case for buffer size == 0 is also good to be tested. I am also sharing my log for this: cifar10_0.8%_buffer0.log
Another small section of instructions is added in our main page, if you want to test with different proportion of labeled data.
Please let me know if you have any further question. Thanks again for your interests in our work! :-)
Best, Zhiqi
Nice, thanks a lot for quick answers and fixes. I will try to reproduce some results on my own. If they match the logs you posted I will let you know and close this issue.
@kangzhiq Thank you for your update. I will run and update you : )
@kangzhiq Hey! I am facing dependencies issues could you please share dependencies with their version. As the you mentioned to un-comment the reproducibility of code while doing this I had to down grade torch to 1.7.1 and again there is lot of dependencies issues. Please share the list of dependencies along with version.
@rokmr Hi! Sorry to hear that. There are two solutions:
requirements.txt
where you can find the necessary dependencies for this project.Hope it helps.
Zhiqi
Hi,
I am closing this thread as it has been inactive for a long time. Please feel free to open another one if you have any further questions.
Thanks again for your interest in our work!
Best, Zhiqi
Hi, I was trying to reproduce the results from your paper but I stumbled into several issues.
I run NNCSL on CIFAR100 splitted into 10 tasks with
buffer_size=500
using this config file. I get 24.2% acc1 but according to Tab.1 I should get 27.4+/-0.5%. Do you have an idea why I cannot reproduce the results?I tried to run NNCSL without exemplars by setting
buffer_size=0
. Surprisingly, I get acc1=25.38%, 1.2% better than withbuffer_size=500
. This is very weird as you report that the results should drop dramatically in absence of exemplars. By looking at the code I suspect that you use the data from the previous tasks for the support set even whenbuffer_size=0
which should not be the case.Could you help me with these issues?