failed to reproduce the result on the paper.

hongdoki commented 7 years ago

Hello,

I am reproducing the result on the paper, actually SVHN to MNIST. At first I tried to find "{stl10,svhn,synth}_tools.py" files or "package" flag on {train, eval}.py as wirrten in README, but I couldn't find it. Therefore, I made and execute .sh files below for training and evaluating with hyper parameter written in the paper.

train.sh

python semisup/train.py \
--dataset="svhn" \
--target_dataset="mnist3" \
--new_size=32 \
--architecture="mnist_model" \
--sup_per_class=-1 \
--unsup_samples=-1 \
--sup_per_batch=100 \
--unsup_per_batch=1000 \
--decay_steps=9000 \
--visit_weight=0.2 \
--walker_wieght_envelop_delay=500 \
--max_steps=25000 \
--logdir=./log/svhn_to_mnist/reproduce

eval.sh
```
python semisup/eval.py \
--logdir=./log/svhn_to_mnist/reproduce \
--dataset="mnist3" \
--new_size=32 \
--architecture="mnist_model"
```
In training, total loss and walker loss made a convergence, but the evaluation accuracy after training done showed about 0.78, not 0.976(paper accuracy).

Do you know what I am missing?

Thank you!

haeusser commented 7 years ago

Hi @hongdoki

I corrected the pointers in the README, sorry about that.

The hyper parameters for the setup SVHN -> MNIST are: The architecture that we used in the paper is svhn_model. Can you please try this one?

The other hyper params are correct I think. For completeness, here is the list of hyper params that produced state of the are:

"target_dataset": "mnist3", "walker_weight_envelope_delay": "500", "new_size": 32, "dataset": "svhn", "sup_per_batch": 100, "decay_steps": 9000, "unsup_batch_size": 1000, "sup_per_class": -1, "walker_weight_envelope_steps": 1, "walker_weight_envelope": "linear", "visit_weight_envelope": "linear", "architecture": "svhn_model", "visit_weight": 0.2, "max_steps": "12000"

Cheers, Philip

hongdoki commented 7 years ago

Thank you for quick replying!

Now I can reproduce the result of the paper with your hyper parameter. :+1:

Actually, The reason I tried "mnist_model" is that the paper said all experiment used the network architecture below.

C(32; 3) -> C(32; 3) -> P(2) -> C(64; 3) -> C(64; 3) -> P(2) -> C(128; 3) -> C(128; 3) -> P(2) -> FC(128)

and "svhn_model" is

C(32; 3) -> C(32; 3) -> C(32; 3) -> P(2) -> C(64; 3) -> C(64; 3) -> C(64; 3) -> P(2) -> C(128; 3) -> C(128; 3) -> C(128; 3) -> P(2) -> FC(128)

I think it needs some correction or I misunderstood something.

Anyway, thank you again for sharing your hyper parameter!

haeusser commented 7 years ago

Well spotted, thank you. Happy that it is working for you.

nlml commented 7 years ago

Hi @hongdoki and @haeusser

Sorry to be annoying, but per my other issue, I'm still having trouble replicating any of the results reported in the paper.

It's strange: I clone the repo and run the exact parameters mentioned above in @haeusser 's reply (see below for command) yet I only get an accuracy of about 0.89. Is there something I'm missing here? Maybe I'm not running the eval script correctly (second command below)?

Any thoughts, or exact instructions on how to replicate any of the results from the paper, would be greatly appreciated.

Liam


# For SVHN to MNIST
CUDA_VISIBLE_DEVICES=3 python semisup/train.py \
 --target_dataset="mnist3" \
 --walker_weight_envelope_delay=500 \
 --new_size=32 \
 --dataset="svhn" \
 --sup_per_batch=100 \
 --decay_steps=9000 \
 --unsup_batch_size=1000 \
 --sup_per_class=-1 \
 --walker_weight_envelope_steps=1 \
 --walker_weight_envelope="linear" \
 --visit_weight_envelope="linear" \
 --architecture="svhn_model" \
 --visit_weight=0.2 \
 --max_steps=12000 \
 --logdir=./log/svhn_to_mnist/reproduce

CUDA_VISIBLE_DEVICES=3 python semisup/eval.py \
 --target_dataset="mnist3" \
 --walker_weight_envelope_delay=500 \
 --new_size=32 \
 --dataset="svhn" \
 --sup_per_batch=100 \
 --decay_steps=9000 \
 --unsup_batch_size=1000 \
 --sup_per_class=-1 \
 --walker_weight_envelope_steps=1 \
 --walker_weight_envelope="linear" \
 --visit_weight_envelope="linear" \
 --architecture="svhn_model" \
 --visit_weight=0.2 \
 --max_steps=12000 \
 --logdir=./log/svhn_to_mnist/reproduce

haeusser commented 7 years ago

Which versions of Tensorflow, CUDA and CUDNN are you using?

haeusser commented 7 years ago

... and does the eval job really evaluate the latest checkpoints? Have you tried to run the same experiment a few times? Usually the random initialization should not have a big effect.

nlml commented 7 years ago

I was using Tensorflow 1.2, but also tried 1.1. CUDA 7.5 and cuDNN 5.0.

Yep I have tried running it a few times, always with the same lackluster results. Maybe I'll try it on an AWS instance...

haeusser commented 7 years ago

@nlml Hm, that doesn't sound right. It is weird that it works for everyone else. Can you double check that the data sets are loaded correctly? Does your system have any special settings for floats? Which OS are you using, after all?

nlml commented 7 years ago

Unfortunately I don't have access to the GPU I was using any more. Can't remember exactly which OS, but it was a fairly standard Linux server setup, so I suppose a recent Ubuntu? Wasn't running any special settings for floats.

When I have some time I will try on an AWS instance. If you get a chance could you maybe confirm exact commands/instructions and the results I should get after X iterations for really any of the notable results you reported? Or if what I've posted above is fine, then just confirm that?

haeusser commented 7 years ago

This is the set of flags for the run that produced the result in the paper:

{ "target_dataset": "mnist3", "walker_weight_envelope_delay": "500", "max_checkpoints": 5, "new_size": 32, "dataset": "svhn", "sup_per_batch": 100, "decay_steps": 9000, "unsup_batch_size": 1000, "sup_per_class": -1, "walker_weight_envelope_steps": 1, "walker_weight_envelope": "linear", "visit_weight_envelope": "linear", "architecture": "svhn_model", "visit_weight": 0.2, "max_steps": "12000" }

nlml commented 7 years ago

Great, thanks again :+1:

haeusser commented 7 years ago

Of course. I hope we can track down the problem!

nlml commented 7 years ago

Hi again @haeusser

So to test this in a different environment, I instantiated this AWS instance image with tensorflow, etc on a p2.xlarge instance (which has one Tesla K80 GPU).

I then SSH to the instance, clone the repo and alter data dirs:

cd /home/ubuntu;
git clone https://github.com/haeusser/learning_by_association.git;
cd learning_by_association/;
perl -i -pe 's/work\/haeusser\/data/home\/ubuntu\/datasets/g' semisup/tools/data_dirs.py;

Make the required changes to ~/.bashrc:

echo -e "\n\nexport PYTHONPATH=/home/ubuntu/learning_by_association:\$PYTHONPATH" >> ~/.bashrc;
source ~/.bashrc;

Download datasets:

mkdir /home/ubuntu/datasets/;
mkdir /home/ubuntu/datasets/svhn/;
mkdir /home/ubuntu/datasets/mnist/;
wget http://ufldl.stanford.edu/housenumbers/test_32x32.mat -O /home/ubuntu/datasets/svhn/test_32x32.mat;
wget http://ufldl.stanford.edu/housenumbers/train_32x32.mat -O /home/ubuntu/datasets/svhn/train_32x32.mat;
wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz -O /home/ubuntu/datasets/mnist/train-images-idx3-ubyte.gz;
wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz -O /home/ubuntu/datasets/mnist/train-labels-idx1-ubyte.gz;
wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz -O /home/ubuntu/datasets/mnist/t10k-images-idx3-ubyte.gz;
wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz -O /home/ubuntu/datasets/mnist/t10k-labels-idx1-ubyte.gz;

Then run:

python semisup/train.py \
 --target_dataset="mnist3" \
 --walker_weight_envelope_delay=500 \
 --new_size=32 \
 --dataset="svhn" \
 --sup_per_batch=100 \
 --decay_steps=9000 \
 --unsup_batch_size=1000 \
 --sup_per_class=-1 \
 --walker_weight_envelope_steps=1 \
 --walker_weight_envelope="linear" \
 --visit_weight_envelope="linear" \
 --architecture="svhn_model" \
 --visit_weight=0.2 \
 --max_steps=12000 \
 --logdir=./log/svhn_to_mnist/reproduce

And eval script:

python semisup/eval.py \
 --target_dataset="mnist3" \
 --walker_weight_envelope_delay=500 \
 --new_size=32 \
 --dataset="svhn" \
 --sup_per_batch=100 \
 --decay_steps=9000 \
 --unsup_batch_size=1000 \
 --sup_per_class=-1 \
 --walker_weight_envelope_steps=1 \
 --walker_weight_envelope="linear" \
 --visit_weight_envelope="linear" \
 --architecture="svhn_model" \
 --visit_weight=0.2 \
 --max_steps=12000 \
 --logdir=./log/svhn_to_mnist/reproduce

...And so far I'm getting pretty similar results in tensorboard to before -- accuracy of around 92% (to be fair, I'm only 3k iterations in so far, but still seems a fair way off...)

So this should be quite reproducible now I think. Any ideas why my results are so far off? Do I need to be using python3 maybe?

Cheers, Liam

haeusser commented 7 years ago

Hi @nlml

alright so I re-ran the training myself again and everything seems fine. I uploaded for you the logs including hyper params and TFEvents so you can visualize the graph with TensorBoard: https://vision.in.tum.de/~haeusser/da_svhn_mnist.zip

The TensorFlow version was https://github.com/haeusser/tensorflow

I hope this is helpful! Philip

nlml commented 7 years ago

Thanks again - sorry to be annoying! I'll take a look and possibly try again with your tensorflow.

Liam

nlml commented 7 years ago

Hey again @haeusser

Thanks a lot again for all your help. I finally got it working :D My problem was I was evaluating on SVHN, not MNIST (see my eval code above.. doh).

Another question that's come up looking at the two papers: What is the difference in approach between Table 5 of the Learning by Association paper, and Table 2 of the Domain Adaptation paper, with regards to SVHN -> MNIST? In the former, you report an error of 0.5%, while in the latter it is 2.4%. My results (and yours) are in line with the latter. I can't seem to find what the difference in approach is in the 0.5% version, but presumably there is some major difference there.

Also, in the Domain Adaptation paper, you state that "The authors of [12] observed that higher order round trips do not improve performance." Where exactly is this stated in the Learning by Association paper? I can't seem to find this idea mentioned.

Thanks, Liam

deep0learning commented 6 years ago

I run this code:

python semisup/train.py \ --target_dataset="mnist3" \ --walker_weight_envelope_delay=500 \ --new_size=32 \ --dataset="svhn" \ --sup_per_batch=100 \ --decay_steps=9000 \ --unsup_batch_size=1000 \ --sup_per_class=-1 \ --walker_weight_envelope_steps=1 \ --walker_weight_envelope="linear" \ --visit_weight_envelope="linear" \ --architecture="svhn_model" \ --visit_weight=0.2 \ --max_steps=12000 \ --logdir=./log/svhn_to_mnist/reproduce

And eval script:

python semisup/eval.py \ --target_dataset="mnist3" \ --walker_weight_envelope_delay=500 \ --new_size=32 \ --dataset="svhn" \ --sup_per_batch=100 \ --decay_steps=9000 \ --unsup_batch_size=1000 \ --sup_per_class=-1 \ --walker_weight_envelope_steps=1 \ --walker_weight_envelope="linear" \ --visit_weight_envelope="linear" \ --architecture="svhn_model" \ --visit_weight=0.2 \ --max_steps=12000 \ --logdir=./log/svhn_to_mnist/reproduce

Train part is alright but evaluation part is not working. Am I doing anything wrong here?

I am getting this error: INFO:tensorflow:Waiting for new checkpoint at ./log/svhn_to_mnist/reproduce/train INFO:tensorflow:Timed-out waiting for a checkpoint.

haeusser commented 6 years ago

Yes, obviously, the evaluation loop does not write any checkpoints. There might be many reasons. Maybe your disk is full or you are running the script from a different directory. If the train job fills the entire GPU, the eval job might run on the CPU and hence be very slow. Can you paste the console output from the eval job?

Cheers, Philip

deep0learning commented 6 years ago

@haeusser Thank you so much for your kind reply. At first I run the semisup/train.py script and after completing this file, then I run semisup/eval.py.

Should I run this two scripts together?

deep0learning commented 6 years ago

My disk is not full and I run the script from the same directory. Should I change any directory path in semisup/eval.py script?

haeusser commented 6 years ago

Since you apparently solved the issue, as I infer from the other thread, could you quickly post what the problem was?

deep0learning commented 6 years ago

@haeusser Thanks. When I run the two scripts train.py and eval.py together, It was working.

4Statistics commented 6 years ago

"run the two scripts train.py and eval.py together,"what's this mean?How to run the py scripts together? Thank You ! This is the result.

INFO:tensorflow:Waiting for new checkpoint at ./log/svhn_to_mnist/reproduce/train INFO:tensorflow:Timed-out waiting for a checkpoint.

I can't find the accuarcy.

4Statistics commented 6 years ago

Very sorry to be annoying! But I really don't know how to run your code normally.

Brownchen commented 5 years ago

Very sorry to be annoying! But I really don't know how to run your code normally.

Did you solve the problem?

haeusser / learning_by_association

failed to reproduce the result on the paper. #3