rizar / systematic-generalization-sqoop

Code for "Systematic Generalization: What Is Required and Can It Be Learned"
Other
37 stars 11 forks source link

Low accuracy when trianing your MAC model in clevr dataset #4

Open xiaohythu opened 4 years ago

xiaohythu commented 4 years ago

123

As the image shows, the training accuracy is 0.7 and the val accuracy is 0.549. I think both of the two accuracies are much lower than the MAC network in https://github.com/stanfordnlp/mac-network. Any instructions?

xiaohythu commented 4 years ago

And I just follow your training command: scripts/train/mac_flatqa.sh --data_dir $DATA/sqoop-variety_1-repeats_30000 --checkpoint_path model.pt\ --num_iterations 100000 and change only the feature dimension to [1024,14,14 ]

rizar commented 4 years ago

How long have you been training the model?

xiaohythu commented 4 years ago

How long have you been training the model?

As my running command shows, num_iterations is 100000

xiaohythu commented 4 years ago

How long have you been training the model?

The training procedure lasts about 10 hours

rizar commented 4 years ago

OK, I will run this experiment later today myself.

xiaohythu commented 4 years ago

OK, I will run this experiment later today myself.

Thank you for your reply, waiting for your results

rizar commented 4 years ago

I am working on it. I presume I broke the model at some point, or maybe a PyTorch change is to blame. If I don't find the issue today, this will have to wait until January though.

On Sun, 22 Dec 2019 at 23:37, songyy14 notifications@github.com wrote:

OK, I will run this experiment later today myself.

Any updates? I also miss the same problem

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rizar/systematic-generalization-sqoop/issues/4?email_source=notifications&email_token=AAE7YYRZM24EB5NJFLLWMZTQ2A6BPA5CNFSM4J6KSQ62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHQFXWQ#issuecomment-568351706, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE7YYTOLBLOOXBTJYMLRXDQ2A6BPANCNFSM4J6KSQ6Q .

rizar commented 4 years ago

While I am tinkering with my setup, could one of you try to run this experiment multiple (like 5) times, please?

rizar commented 4 years ago

I could not reproduce your issue. I have just trained 10 models, and they all worked fine. Can you please try running the experiment many times and tell me if the issue occurs all the time, or some of the time?

xiaohythu commented 4 years ago

I could not reproduce your issue. I have just trained 10 models, and they all worked fine. Can you please try running the experiment many times and tell me if the issue occurs all the time, or some of the time? Did you change your setup,code or running command?

xiaohythu commented 4 years ago

Still,I obtain the lower performance as I stated in the question. Maybe I need some detailed information about your training. Here my setup is CUDA10.1 and torch 1.3.1

xiaohythu commented 4 years ago

Before running your MAC model,I utilize Resnet101 to extract features from Clevr dataset and convert them to . h5 file. Also I preprocess the questions. Is my way correct?

xiaohythu commented 4 years ago

I could not reproduce your issue. I have just trained 10 models, and they all worked fine. Can you please try running the experiment many times and tell me if the issue occurs all the time, or some of the time?

Hi rizar! I found that when I reproduce your mac model in clevr dataset. Such an error occured: Traceback (most recent call last): File "/home/xhy/systematic-generalization-sqoop-master/scripts/train_model.py", line 1271, in main(args) File "/home/xhy/systematic-generalization-sqoop-master/scripts/train_model.py", line 393, in main train_loop(args, train_loader, val_loader) File "/home/xhy/systematic-generalization-sqoop-master/scripts/train_model.py", line 530, in train_loop for batch in train_loader: File "/home/xhy/anaconda3/envs/sqoop1/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 264, in next batch = self.collate_fn([self.dataset[i] for i in indices]) File "/home/xhy/anaconda3/envs/sqoop1/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 264, in batch = self.collate_fn([self.dataset[i] for i in indices]) File "/home/xhy/systematic-generalization-sqoop-master/vr/data.py", line 130, in getitem program_json = self.program_converter.prefix_to_list(program_json_seq) File "/home/xhy/systematic-generalization-sqoop-master/vr/programs.py", line 109, in prefix_to_list return self.tree_to_list(self.prefix_to_tree(program_prefix)) File "/home/xhy/systematic-generalization-sqoop-master/vr/programs.py", line 105, in prefix_totree return helper() File "/home/xhy/systematic-generalization-sqoop-master/vr/programs.py", line 103, in helper 'inputs': [helper() for in range(self.get_num_inputs(cur))], File "/home/xhy/systematic-generalization-sqoop-master/vr/programs.py", line 137, in get_num_inputs return self._vocab['program_token_arity'][f] KeyError: 'program_token_arity'

It seems that the clevr dataset is different from your sqoop dataset. Can you give me some instructions?

xiaohythu commented 4 years ago

I have trained the MAC model in clevr dataset for more than 10 times. All the results are similar with what I mentioned in my question. I believe that you changed something in training but I did not! Need help

rizar commented 4 years ago

I am sorry to hear the code doesn't work for you. For now all I can do is to give an extra info w.r.t the environment. I run the code in a Docker image that is based on "nvidia/cuda:9.1-cudnn7-devel-ubuntu16.04". I build the conda environment in the image. Here is the the output of conda list:

(sysgen) dzmitry@a574659fd138:/workspace$ conda list
# packages in environment at /home/dzmitry/miniconda2/envs/sysgen:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
blas                      1.0                         mkl  
ca-certificates           2019.11.27                    0  
certifi                   2019.11.28               py36_0  
cffi                      1.13.2           py36h2e261b9_0  
cuda90                    1.0                  h6433d27_0    pytorch
cudatoolkit               10.1.243             h6bb024c_0  
freetype                  2.9.1                h8a8886c_1  
h5py                      2.9.0            py36h7918eee_0  
hdf5                      1.10.4               hb1b8bf9_0  
intel-openmp              2019.4                      243  
jpeg                      9b                   h024ee3a_2  
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.1.0                h2733197_0  
mkl                       2019.4                      243  
mkl-service               2.3.0            py36he904b0f_0  
mkl_fft                   1.0.15           py36ha843d7b_0  
mkl_random                1.1.0            py36hd6b4f25_0  
ncurses                   6.1                  he6710b0_1  
ninja                     1.9.0            py36hfd86e86_0  
nmn-iwp                   0.1                       <pip>
numpy                     1.17.4           py36hc1035e2_0  
numpy-base                1.17.4           py36hde5b4d6_0  
olefile                   0.46                       py_0  
openssl                   1.1.1d               h7b6447c_3  
pillow                    6.2.1            py36h34e0f95_0  
pip                       19.3.1                   py36_0  
pycparser                 2.19                       py_0  
python                    3.6.9                h265db76_0  
pytorch                   1.3.1           py3.6_cuda10.1.243_cudnn7.6.3_0    pytorch
readline                  7.0                  h7b6447c_5  
scipy                     1.3.2            py36h7c811a0_0  
setuptools                42.0.2                   py36_0  
six                       1.13.0                   py36_0  
sqlite                    3.30.1               h7b6447c_0  
termcolor                 1.1.0                    py36_1  
tk                        8.6.8                hbc83047_0  
torchvision               0.4.2                py36_cu101    pytorch
tqdm                      4.40.2                     py_0  
wheel                     0.33.6                   py36_0  
xz                        5.2.4                h14c3975_4  
zlib                      1.2.11               h7b6447c_3  
zstd                      1.3.7                h0b5b093_0  

I can give you more info on Monday.

xiaohythu commented 4 years ago

As I mentioned in this issue,An error occured: File "/home/xhy/systematic-generalization-sqoop-master/vr/programs.py", line 137, in get_num_inputs return self._vocab['program_token_arity'][f] KeyError: 'program_token_arity'. I guess the vocab.json of clevr is different from your sqoop dataset. How should I solve this?

rizar commented 4 years ago

I have looked at both vocab.json files, and the both seem to have program_token_arity keys in them. Can you please tell me what keys you have in your vocab.json file and also where you got it from?