Closed PiotrCzapla closed 4 years ago
Superb! @cstorm125 let me add you to this repo so you can write directly.
@cstorm125 can you describe how you want to have this implemented?
Thanks for opening this issue. I think setting up the data processing and general evaluation on XNLI will be necessary for submitting the paper and should be largely independent of the other things we're doing.
I've started setting up basic data processing for XNLI at a new branch: https://github.com/n-waves/ulmfit-multilingual/tree/xnli
e.g. XNLI download script here
If I understand correctly, the goal is to get it to the point where we can use pretrain_lm
with XNLI in the same way we use it with Wikidumps?
Thanks, Nirant! The goal would be to fine-tune the pretrained language model and train the classifier on the XNLI data and to then evaluate on it, so that we can compare to multilingual BERT. Does that make sense?
@sebastianruder how do you want to differentiate between the text and the hypothesis? Using different xxfld
, or using separate RNNs one to read the text and another to read hypothesis?
I think the best thing to try is:
Ok if we want to use fields we should most likely fix the order in which fields appear in the backward pass as currently, they appear as xxfld 1 text
in froward LM and text 1 xxfld
in backward LM.
I think it'd be good if we could get rid of fields in favour of <bos>
(beginning of sentence/example) and <eos>
(end of sentence/example) tokens, which are more standard in the literature. We're already adding <eos>
tokens to the vocabulary to replace the newlines in the Wikipedia data, so it seems we just need to add the <bos>
tokens.
so you would like to have <bos> premise text <eos> <bos> hypothesis <eos>
? How would it work in backward LM? BERT is adding an additional trainable vector to all words in the second sentence, I though they do that because it works better than what openai suggested.
But I think we can experiment with a few different markups.
Basically, except we'd add another special token for entailment, so that the input would be:
<bos> premise text <eos> <sep> <bos> hypothesis <eos>
Representations for <bos>
and <eos>
should already be learned during language modeling. During fine-tuning, we then only need to learn an embedding for <sep>
.
Yeah, in their case they also pretrain the segment embeddings for sentence A and sentence B with their next sentence prediction task. As we don't do that, I'm not sure if this will be better than the first approach, but it's worth a try.
Yeah, in their case they also pre-train the segment embeddings for sentence A and sentence B with their next sentence prediction task
Nothing stops us from doing this as an intermediate step, that way <sep>
would also be learned. But that for later.
Agreed. It'd be cool if we could try this later. That way, we could also position our approach as allowing faster experimentation with such things compared to BERT.
Context: I am trying to use the train_clas
to fine tune and classify for XNLI-English. I am using the read_xnli
from fastai_contrib
.
I am able to fine tune the Wikitext-103 LM from fastai for 2 epochs with the following:
$python ulmfit/train_clas.py --data_dir=./data --dataset=xnli --bs=20
...
2 4.168341 4.187915 0.326962
Starting classifier training
epoch train_loss valid_loss accuracy
... ... ... ...
At the classifier step, (when trying to validate?) at the end of first epoch:
Traceback (most recent call last):
File "ulmfit/train_clas.py", line 162, in <module>
File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "ulmfit/train_clas.py", line 145, in new_train_clas
File "/home/nirant/fastai/fastai/train.py", line 20, in fit_one_cycle
learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
File "/home/nirant/fastai/fastai/basic_train.py", line 162, in fit
callbacks=self.callbacks+callbacks)
File "/home/nirant/fastai/fastai/basic_train.py", line 94, in fit
raise e
File "/home/nirant/fastai/fastai/basic_train.py", line 89, in fit
cb_handler=cb_handler, pbar=pbar)
File "/home/nirant/fastai/fastai/basic_train.py", line 49, in validate
for xb,yb in progress_bar(dl, parent=pbar, leave=(pbar is not None)):
File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/fastprogress/fastprogress.py", line 65, in __iter__
for i,o in enumerate(self._gen):
File "/home/nirant/fastai/fastai/basic_data.py", line 47, in __iter__
for b in self.dl:
File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 615, in __next__
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/nirant/fastai/fastai/text/data.py", line 92, in pad_collate
return res, tensor([s[1] for s in samples])
File "/home/nirant/fastai/fastai/torch_core.py", line 68, in tensor
return torch.tensor(x) if is_listy(x) else as_tensor(x)
RuntimeError: Could not infer dtype of NoneType
This happens at the fit_one_cycle
for classification. Is this because of the vocabulary mismatch? Or because we do not update automatically from 2-predicted classes as in imdb to 3-class as in XNLI?
What are other possible points of error which I should check?
Another source of error might be that we're not mapping the XNLI labels to int
at the moment.
Would that not fail on first iteration of the classifier if the our mapping is not accepted by the fit_one_cycle
?
I tried to test on Arabic model and got poor results then tried English using WT103_1 model and it didn't work well either. Not sure what's wrong. Code for En here. I used 2 cols (premise and hypo) with mark_fields=True.
Guys, can anyone have a look at refactoring branch and try to implement a working XNLI that respect recent changes in Fastai, so we get a baseline that we can later try to improve upon?
Let me close that for the time as it is unlikely we will play with XNLI anytime soon
Working on this for Thai with a new network trained on QRNN. I got worse perplexity on Wiki but I think it might not matter much.