Reproduction problem - Githubissues

danielm1405 commented 1 year ago

Hi, I was trying to reproduce the results from your paper but I stumbled into several issues.

I run NNCSL on CIFAR100 splitted into 10 tasks with buffer_size=500 using this config file. I get 24.2% acc1 but according to Tab.1 I should get 27.4+/-0.5%. Do you have an idea why I cannot reproduce the results?
I tried to run NNCSL without exemplars by setting buffer_size=0. Surprisingly, I get acc1=25.38%, 1.2% better than with buffer_size=500. This is very weird as you report that the results should drop dramatically in absence of exemplars. By looking at the code I suspect that you use the data from the previous tasks for the support set even when buffer_size=0 which should not be the case.

Could you help me with these issues?

kangzhiq commented 1 year ago

Hi,

Thank you for being interested in our work!

For Question 1: I am providing this log file output.log from our previous experiments with NNCSL 0.8% labeled data and buffer size 500 on CIFAR100. Could you please compre your runs to verify if the model performs correctly during all continual stages? In parallel I will also try to reproduce the results.

For Question 2: I will also try to run as your setting to verify. I will keep you updated :-)

Given that the CVPR deadline is approaching, please understand that I won't be able to debug (if necessary) very quickly before that. Sorry for that!

danielm1405 commented 1 year ago

Q1: Thanks for the log. This is a comparison of our runs. They start to differ significantly after task 3.	Task id	Your results
1	76.1	76.4
2	59.85	59.45
3	51.8	48.7
4	42.77	37.07
5	37.38	33.72
6	35.06	31.81
7	32.75	30.17
8	30.31	26.73
9	28.15	25.51
10	27.25	24.2

Q2: Thanks!

Do you use this exact repo to produce your results? Or do you have some internal version of the repo that may differ and produce different results? I suspect that you use different repo because this repo cannot be run out-of-the-box (because of missing import numpy as np in src/utils.py)

kangzhiq commented 1 year ago

Hi,

Thanks for your feedback!

Ok I see, indeed it starts to differ from Task 3. And yes, we have an internal version that includes all our changes/variants. So there might be some inconsistency between the internal version and this public clean version. I am launching experiments from my side to verify.

But please be ensured that the results are reproducible :-)

Sorry for the bugs that still exist in the repo. I also noticed them and corrected them in my last commit.

Zhiqi

danielm1405 commented 1 year ago

And how do you exactly control the fraction of labeled samples? With data.unlabeled_frac parameter?

rokmr commented 1 year ago

Hi,

Thanks for your feedback!

Ok I see, indeed it starts to differ from Task 3. And yes, we have an internal version that includes all our changes/variants. So there might be some inconsistency between the internal version and this public clean version. I am launching experiments from my side to verify.

But please be ensured that the results are reproducible :-)

Sorry for the bugs that still exist in the repo. I also noticed them and corrected them in my last commit.

Zhiqi

Hi! I am also not able to reproduce the result for the CIFAR10 at 0.8% buffer500. In the paper it is mentioned that the 73.2% but I am getting only 68.02% while running it. Could you please share the internal version.

kangzhiq commented 1 year ago

Hi,

First of all, with my reproduction, I confirm that there must be something wrong with this version. I will work on this today, please stay tuned!

And how do you exactly control the fraction of labeled samples? With data.unlabeled_frac parameter?

No, it is based on the files in /subsets, where we hardcode the index for the selected labeled samples. And then we also need to source these files by parameters data.subset_path and data.subset_path_cls in the config file. I noticed that I didn't upload the code for generating indexes for different proportions of labeled data. I will also update our code to make it easier to use.

Zhiqi

kangzhiq commented 1 year ago

Hi, Thanks for your feedback! Ok I see, indeed it starts to differ from Task 3. And yes, we have an internal version that includes all our changes/variants. So there might be some inconsistency between the internal version and this public clean version. I am launching experiments from my side to verify. But please be ensured that the results are reproducible :-) Sorry for the bugs that still exist in the repo. I also noticed them and corrected them in my last commit. Zhiqi

Hi! I am also not able to reproduce the result for the CIFAR10 at 0.8% buffer500. In the paper it is mentioned that the 73.2% but I am getting only 68.02% while running it. Could you please share the internal version.

Hi!

As you might have seen from my discussion with @danielm1405 , there must be something wrong with this version. I will try to fix this today, please stay tuned! Sorry for the inconvenience.

kangzhiq commented 1 year ago

@danielm1405 @rokmr

Hi,

A quick update: I have figured out why this version is underperforming. I am running experiments on my side to validate before uploading the changes, which might take some time. In the meantime, I have also updated the buffer to a standard reservoir to make it easier to use.

I will ping you once the new version is uploaded. Thanks for your patience!

Best, Zhiqi

kangzhiq commented 1 year ago

Hey @danielm1405 @rokmr ,

I just updated the repository for better reproducibility.

I recommend you to activate the deterministic mode, by uncommenting these codes: https://github.com/kangzhiq/NNCSL/blob/a38078aaa911ee43f5f5b03998a53bb4399c33b6/src/nncsl_train.py#L75-L82

If you activate the deterministic mode, you would be 100% reproducing the exact results as my logs:

cifar10 with buffer size 500 and 0.8% labeled data: c10_0.8%_buffer500.log
cifar100 with buffer size 500 and 0.8% labeled data:
c100_0.8%_buffer500.log

Besides, the case for buffer size == 0 is also good to be tested. I am also sharing my log for this: cifar10_0.8%_buffer0.log

Another small section of instructions is added in our main page, if you want to test with different proportion of labeled data.

Please let me know if you have any further question. Thanks again for your interests in our work! :-)

Best, Zhiqi

danielm1405 commented 1 year ago

Nice, thanks a lot for quick answers and fixes. I will try to reproduce some results on my own. If they match the logs you posted I will let you know and close this issue.

rokmr commented 1 year ago

@kangzhiq Thank you for your update. I will run and update you : )

rokmr commented 1 year ago

@kangzhiq Hey! I am facing dependencies issues could you please share dependencies with their version. As the you mentioned to un-comment the reproducibility of code while doing this I had to down grade torch to 1.7.1 and again there is lot of dependencies issues. Please share the list of dependencies along with version.

kangzhiq commented 1 year ago

@rokmr Hi! Sorry to hear that. There are two solutions:

I have updated the repository with the requirements.txt where you can find the necessary dependencies for this project.
You can also just run the method without activating the deterministic mode, the performance would not be exactly the same as my log, but would not be far from it.

Hope it helps.

Zhiqi

kangzhiq commented 10 months ago

Hi,

I am closing this thread as it has been inactive for a long time. Please feel free to open another one if you have any further questions.

Thanks again for your interest in our work!

Best, Zhiqi

kangzhiq / NNCSL

Reproduction problem #1