Open xiaohythu opened 4 years ago
And I just follow your training command: scripts/train/mac_flatqa.sh --data_dir $DATA/sqoop-variety_1-repeats_30000 --checkpoint_path model.pt\ --num_iterations 100000 and change only the feature dimension to [1024,14,14 ]
How long have you been training the model?
How long have you been training the model?
As my running command shows, num_iterations is 100000
How long have you been training the model?
The training procedure lasts about 10 hours
OK, I will run this experiment later today myself.
OK, I will run this experiment later today myself.
Thank you for your reply, waiting for your results
I am working on it. I presume I broke the model at some point, or maybe a PyTorch change is to blame. If I don't find the issue today, this will have to wait until January though.
On Sun, 22 Dec 2019 at 23:37, songyy14 notifications@github.com wrote:
OK, I will run this experiment later today myself.
Any updates? I also miss the same problem
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rizar/systematic-generalization-sqoop/issues/4?email_source=notifications&email_token=AAE7YYRZM24EB5NJFLLWMZTQ2A6BPA5CNFSM4J6KSQ62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHQFXWQ#issuecomment-568351706, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE7YYTOLBLOOXBTJYMLRXDQ2A6BPANCNFSM4J6KSQ6Q .
While I am tinkering with my setup, could one of you try to run this experiment multiple (like 5) times, please?
I could not reproduce your issue. I have just trained 10 models, and they all worked fine. Can you please try running the experiment many times and tell me if the issue occurs all the time, or some of the time?
I could not reproduce your issue. I have just trained 10 models, and they all worked fine. Can you please try running the experiment many times and tell me if the issue occurs all the time, or some of the time? Did you change your setup,code or running command?
Still,I obtain the lower performance as I stated in the question. Maybe I need some detailed information about your training. Here my setup is CUDA10.1 and torch 1.3.1
Before running your MAC model,I utilize Resnet101 to extract features from Clevr dataset and convert them to . h5 file. Also I preprocess the questions. Is my way correct?
I could not reproduce your issue. I have just trained 10 models, and they all worked fine. Can you please try running the experiment many times and tell me if the issue occurs all the time, or some of the time?
Hi rizar!
I found that when I reproduce your mac model in clevr dataset. Such an error occured:
Traceback (most recent call last):
File "/home/xhy/systematic-generalization-sqoop-master/scripts/train_model.py", line 1271, in
It seems that the clevr dataset is different from your sqoop dataset. Can you give me some instructions?
I have trained the MAC model in clevr dataset for more than 10 times. All the results are similar with what I mentioned in my question. I believe that you changed something in training but I did not! Need help
I am sorry to hear the code doesn't work for you. For now all I can do is to give an extra info w.r.t the environment. I run the code in a Docker image that is based on "nvidia/cuda:9.1-cudnn7-devel-ubuntu16.04". I build the conda environment in the image. Here is the the output of conda list
:
(sysgen) dzmitry@a574659fd138:/workspace$ conda list
# packages in environment at /home/dzmitry/miniconda2/envs/sysgen:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
blas 1.0 mkl
ca-certificates 2019.11.27 0
certifi 2019.11.28 py36_0
cffi 1.13.2 py36h2e261b9_0
cuda90 1.0 h6433d27_0 pytorch
cudatoolkit 10.1.243 h6bb024c_0
freetype 2.9.1 h8a8886c_1
h5py 2.9.0 py36h7918eee_0
hdf5 1.10.4 hb1b8bf9_0
intel-openmp 2019.4 243
jpeg 9b h024ee3a_2
libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 hd88cf55_4
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.1.0 h2733197_0
mkl 2019.4 243
mkl-service 2.3.0 py36he904b0f_0
mkl_fft 1.0.15 py36ha843d7b_0
mkl_random 1.1.0 py36hd6b4f25_0
ncurses 6.1 he6710b0_1
ninja 1.9.0 py36hfd86e86_0
nmn-iwp 0.1 <pip>
numpy 1.17.4 py36hc1035e2_0
numpy-base 1.17.4 py36hde5b4d6_0
olefile 0.46 py_0
openssl 1.1.1d h7b6447c_3
pillow 6.2.1 py36h34e0f95_0
pip 19.3.1 py36_0
pycparser 2.19 py_0
python 3.6.9 h265db76_0
pytorch 1.3.1 py3.6_cuda10.1.243_cudnn7.6.3_0 pytorch
readline 7.0 h7b6447c_5
scipy 1.3.2 py36h7c811a0_0
setuptools 42.0.2 py36_0
six 1.13.0 py36_0
sqlite 3.30.1 h7b6447c_0
termcolor 1.1.0 py36_1
tk 8.6.8 hbc83047_0
torchvision 0.4.2 py36_cu101 pytorch
tqdm 4.40.2 py_0
wheel 0.33.6 py36_0
xz 5.2.4 h14c3975_4
zlib 1.2.11 h7b6447c_3
zstd 1.3.7 h0b5b093_0
I can give you more info on Monday.
As I mentioned in this issue,An error occured: File "/home/xhy/systematic-generalization-sqoop-master/vr/programs.py", line 137, in get_num_inputs return self._vocab['program_token_arity'][f] KeyError: 'program_token_arity'. I guess the vocab.json of clevr is different from your sqoop dataset. How should I solve this?
I have looked at both vocab.json
files, and the both seem to have program_token_arity
keys in them. Can you please tell me what keys you have in your vocab.json
file and also where you got it from?
As the image shows, the training accuracy is 0.7 and the val accuracy is 0.549. I think both of the two accuracies are much lower than the MAC network in https://github.com/stanfordnlp/mac-network. Any instructions?