uclnlp / jack

Jack the Reader
MIT License
257 stars 82 forks source link

Pretrained FastQA model changed? #361

Closed minttusofia closed 6 years ago

minttusofia commented 6 years ago

It seems that the FastQA model has been updated since I last downloaded it (November 21). The new model achieves much lower performance (9%) on my end task compared to the previous version (16%). What changed, and is this intentional?

dirkweissenborn commented 6 years ago

Hi! Sorry for that. There might be a couple of reasons.

Do you run on the latest code base?

Can you run the following and report the results on the SQuAD dev set? Just to be sure, that it gives the right performance on SQuAD?

$ data/SQuAD/download.sh
$ bin/jack-eval.py --dataset data/SQuAD/dev-v1.1.json --loader squad --save_dir path/to/model

What is your end task and how much data are you testing on? If the end task is very different to SQuAD the results might be very sensitive to different models although they do similarly on SQuAD.

Did you try using another model? There are a couple of different models here. Maybe another will work better.

minttusofia commented 6 years ago

Yes, I'm using the latest code base. The earlier version of the code base and the later model are not compatible, and vice versa. For the current version, I get:

Exact: 0.23453169347209082
F1: 0.4494948675517222

(Not sure what the performance should be.)

My end task is a multi-hop QA setting on WikiHop data, so admittedly somewhat different from SQuAD but I didn't expect the trained model to be changed. 9% vs. 16% accuracy isn't a huge deal in this case as it is for a baseline rather than a proper model but I need to stick to one version of FastQA to have comparable results. I haven't tried other models as speed is a priority.

I suppose I might be able to revert to an earlier version of Jack and keep using the former version of the model?

dirkweissenborn commented 6 years ago

The results are terrible. There is something wrong with the model. Let me look into that. In the mean time you can simply try out the other models.

On Tue, Feb 6, 2018, 23:16 Minttu Alakuijala notifications@github.com wrote:

Yes, I'm using the latest code base. The earlier version of the code base and the later model are not compatible, and vice versa. For the current version, I get:

Exact: 0.23453169347209082 F1: 0.4494948675517222

(Not sure what the performance should be.)

My end task is a multi-hop QA setting on WikiHop data, so admittedly somewhat different from SQuAD but I didn't expect the trained model to be changed. 9% vs. 16% accuracy isn't a huge deal in this case as it is for a baseline rather than a proper model but I need to stick to one version of FastQA to have comparable results. I haven't tried other models as speed is a priority.

I suppose I might be able to revert to an earlier version of Jack and keep using the former version of the model?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/uclmr/jack/issues/361#issuecomment-363583648, or mute the thread https://github.com/notifications/unsubscribe-auth/ABU6G3rp8Q3RHb1Xtd5wLr3mWR3nqRJ9ks5tSM9WgaJpZM4R7fNb .

pminervini commented 6 years ago

@dirkweissenborn @minttusofia can we use tests and CI for making sure that models keep getting decent results? I'll look into that tomorrow.

dirkweissenborn commented 6 years ago

@minttusofia did you use the latest FastQA model (after 6th December)? The current model and code base gives the following results:

Exact: 0.6737937559129612
F1: 0.7737367559753092

The code might have changed slightly, s.t. the old model doesn't work with it anymore. Can you please download the latest model and check again.

pminervini commented 6 years ago

This is what I get - @dirkweissenborn I can give you access to this machine/jack installation if you want:

jack@hetzner:~/workspace/jack$ python3 ./bin/jack-eval.py --dataset data/SQuAD/dev-v1.1.json --loader squad --save_dir ./fastqa                                                                              
INFO:jack-eval.py:Creating and loading reader from ./fastqa...
2018-02-07 17:39:43.876724: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
INFO:tensorflow:Restoring parameters from ./fastqa/model_module
INFO:tensorflow:Restoring parameters from ./fastqa/model_module
INFO:jack-eval.py:Start!
INFO:jack.core.input_module:OnlineInputModule pre-processes data on-the-fly in first epoch and caches results for subsequent epochs! That means, first epoch might be slower.
INFO:jack.core.reader:Start answering...
 [Elapsed Time: 0:03:01] |################################################################################################################################################################| (Time: 0:03:01)
INFO:jack-eval.py:############### RESULTS ##############
Exact: 0.23453169347209082
F1: 0.4494948675517222
dirkweissenborn commented 6 years ago

Hmm, that is very strange. Did you download the latest model? If yes, then it must be the embeddings. In that case, I will recreate the embeddings memory map tomorrow and see if that is really the problem...

On Wed, Feb 7, 2018, 17:43 Pasquale Minervini notifications@github.com wrote:

This is what I get:

INFO:jack-eval.py:Creating and loading reader from ./fastqa... 2018-02-07 17:39:43.876724: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA INFO:tensorflow:Restoring parameters from ./fastqa/model_module INFO:tensorflow:Restoring parameters from ./fastqa/model_module INFO:jack-eval.py:Start! INFO:jack.core.input_module:OnlineInputModule pre-processes data on-the-fly in first epoch and caches results for subsequent epochs! That means, first epoch might be slower. INFO:jack.core.reader:Start answering... [Elapsed Time: 0:03:01] |################################################################################################################################################################| (Time: 0:03:01) INFO:jack-eval.py:############### RESULTS ############## Exact: 0.23453169347209082 F1: 0.4494948675517222

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/uclmr/jack/issues/361#issuecomment-363830940, or mute the thread https://github.com/notifications/unsubscribe-auth/ABU6G27pmsDdIwIlk1JNRYysccBLTL6gks5tSdKkgaJpZM4R7fNb .

pminervini commented 6 years ago

Did you download the latest model?

Yep

If yes, then it must be the embeddings.

This is one reason why I was stressing to make the embeddings part of the model, at least at this early stage

dirkweissenborn commented 6 years ago

Well, I recreated the embeddings and still get the same results. I downloaded the memory mapped embeddings from neuralnoise and the same results. So this is not the problem. The only explanation is that you have the wrong model. Can you please try it again using the following command for downloading the model? The README instructions were not entirely correct.

wget -O fastqa.zip https://www.dropbox.com/s/qb796uljoqj0lvo/fastqa.zip\?dl\=1
unzip fastqa.zip
python3 bin/jack-eval.py --dataset data/SQuAD/dev-v1.1.json --loader squad --save_dir fastqa
pminervini commented 6 years ago
8 jack@hetzner:~/workspace/jack$ wget -O fastqa.zip -c "https://www.dropbox.com/s/qb796uljoqj0lvo/fastqa.zip?dl=1"                     
--2018-02-07 18:55:02--  https://www.dropbox.com/s/qb796uljoqj0lvo/fastqa.zip?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 2620:100:6022:1::a27d:4201, 162.125.66.1
Connecting to www.dropbox.com (www.dropbox.com)|2620:100:6022:1::a27d:4201|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://dl.dropboxusercontent.com/content_link/UJwuVKVqSw2hLf8ZvOcSBfLP8XkH7c8zulJ4RhsF2ro5qpP0JwtdI5KxMiSBkaM3/file?dl=1 [following]
--2018-02-07 18:55:02--  https://dl.dropboxusercontent.com/content_link/UJwuVKVqSw2hLf8ZvOcSBfLP8XkH7c8zulJ4RhsF2ro5qpP0JwtdI5KxMiSBkaM3/file?dl=1
Resolving dl.dropboxusercontent.com (dl.dropboxusercontent.com)... 2620:100:6022:6::a27d:4206, 162.125.66.6
Connecting to dl.dropboxusercontent.com (dl.dropboxusercontent.com)|2620:100:6022:6::a27d:4206|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 50074980 (48M) [application/binary]
Saving to: ‘fastqa.zip’

fastqa.zip                        100%[============================================================>]  47.75M  14.2MB/s    in 4.1s

2018-02-07 18:55:07 (11.6 MB/s) - ‘fastqa.zip’ saved [50074980/50074980]

jack@hetzner:~/workspace/jack$ unzip fastqa.zip
Archive:  fastqa.zip
   creating: fastqa/
   creating: fastqa/shared_resources_vocab/
  inflating: fastqa/shared_resources_vocab/conf.yaml
  inflating: fastqa/shared_resources_vocab/remainder.pkl
  inflating: fastqa/model_module.data-00000-of-00001
  inflating: fastqa/model_module.meta
  inflating: fastqa/model_module.index
  inflating: fastqa/checkpoint
  inflating: fastqa/shared_resources
jack@hetzner:~/workspace/jack$ python3 bin/jack-eval.py --dataset data/SQuAD/dev-v1.1.json --loader squad --save_dir fastqa
INFO:jack-eval.py:Creating and loading reader from fastqa...
2018-02-07 18:55:41.462609: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
INFO:tensorflow:Restoring parameters from fastqa/model_module
INFO:tensorflow:Restoring parameters from fastqa/model_module
INFO:jack-eval.py:Start!
INFO:jack.core.input_module:OnlineInputModule pre-processes data on-the-fly in first epoch and caches results for subsequent epochs! That means, first epoch might be slower.
INFO:jack.core.reader:Start answering...
 [Elapsed Time: 0:02:00] |##########################################################################################| (Time: 0:02:00)
INFO:jack-eval.py:############### RESULTS ##############
Exact: 0.23453169347209082
F1: 0.4494948675517222
dirkweissenborn commented 6 years ago

@minttusofia @pminervini It is tensorflow. With TF 1.5 I get the same bad results. On 1.4 it works properly... I think we have to put a version for TF.

pminervini commented 6 years ago

Confirmed, it's TF - I would never expected such a drastic change, what's the reason?

jack@hetzner:~/workspace/jack$ PYTHONPATH=. python3 bin/jack-eval.py --dataset data/SQuAD/dev-v1.1.json --loader squad --save_dir fastq
a
INFO:jack-eval.py:Creating and loading reader from fastqa...
2018-02-07 19:06:55.469739: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
INFO:tensorflow:Restoring parameters from fastqa/model_module
INFO:tensorflow:Restoring parameters from fastqa/model_module
INFO:jack-eval.py:Start!
INFO:jack.core.input_module:OnlineInputModule pre-processes data on-the-fly in first epoch and caches results for subsequent epochs! That means, first epoch might be slower.
INFO:jack.core.reader:Start answering...
 [Elapsed Time: 0:01:53] |##########################################################################################| (Time: 0:01:53)
INFO:jack-eval.py:############### RESULTS ##############
Exact: 0.23453169347209082
F1: 0.4494948675517222
jack@hetzner:~/workspace/jack$ python3 -m pip install tensorflow==1.4
Collecting tensorflow==1.4
  Using cached tensorflow-1.4.0-cp36-cp36m-manylinux1_x86_64.whl
Requirement already satisfied: protobuf>=3.3.0 in /home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages (from tensorflow==1.4)
Collecting tensorflow-tensorboard<0.5.0,>=0.4.0rc1 (from tensorflow==1.4)
  Downloading tensorflow_tensorboard-0.4.0-py3-none-any.whl (1.7MB)
    100% |████████████████████████████████| 1.7MB 1.3MB/s
Requirement already satisfied: six>=1.10.0 in /home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages (from tensorflow==1.4)
Requirement already satisfied: wheel>=0.26 in /home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages (from tensorflow==1.4)
Requirement already satisfied: numpy>=1.12.1 in /home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages (from tensorflow==1.4)
Requirement already satisfied: enum34>=1.1.6 in /home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages (from tensorflow==1.4)
Requirement already satisfied: setuptools in /home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages (from protobuf>=3.3.0->tensorflow==1.4)
Requirement already satisfied: bleach==1.5.0 in /home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow==1.4)
Requirement already satisfied: markdown>=2.6.8 in /home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow==1.4)
Requirement already satisfied: werkzeug>=0.11.10 in /home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow==1.4)
Requirement already satisfied: html5lib==0.9999999 in /home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow==1.4)
Installing collected packages: tensorflow-tensorboard, tensorflow
  Found existing installation: tensorflow-tensorboard 1.5.1
    Uninstalling tensorflow-tensorboard-1.5.1:
      Successfully uninstalled tensorflow-tensorboard-1.5.1
  Found existing installation: tensorflow 1.5.0
    Uninstalling tensorflow-1.5.0:
      Successfully uninstalled tensorflow-1.5.0
Successfully installed tensorflow-1.4.0 tensorflow-tensorboard-0.4.0
jack@hetzner:~/workspace/jack$ PYTHONPATH=. python3 bin/jack-eval.py --dataset data/SQuAD/dev-v1.1.json --loader squad --save_dir fastqa
/home/jack/.pyenv/versions/3.6.3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
INFO:jack-eval.py:Creating and loading reader from fastqa...
2018-02-07 19:09:37.057327: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
INFO:tensorflow:Restoring parameters from fastqa/model_module
INFO:tensorflow:Restoring parameters from fastqa/model_module
INFO:jack-eval.py:Start!
INFO:jack.core.input_module:OnlineInputModule pre-processes data on-the-fly in first epoch and caches results for subsequent epochs! That means, first epoch might be slower.
INFO:jack.core.reader:Start answering...
 [Elapsed Time: 0:01:37] |##########################################################################################| (Time: 0:01:37)
INFO:jack-eval.py:############### RESULTS ##############
Exact: 0.6737937559129612
F1: 0.7737367559753092
dirkweissenborn commented 6 years ago

No idea. I thought TF was stable from 1.0 onwards, pfff. We need to figure out what to do about it. I think we have to put a version on TF and whenever there is new version and performance drops we need to retrain models before upgrading.

pminervini commented 6 years ago

BTW it's really weird - I've seen fluctuations in results after switching TF version, but this is really huge

What may be the cause ? Can we check where in the computation graph the results change ?

JohannesMaxWel commented 6 years ago

I observed something similar also with the Jack BiDAF model, where performance drops drastically when switching from tf 1.3 to tf 1.5. Working with the older tf version for now.

pminervini commented 6 years ago

It's very suspicious btw, it may be a symptom of a bug - it would be useful to check at which step, in the forward pass, there are big differences between using 1.3 and 1.5; I'll be on this (and the paper) after ACL