Basic support for testing ULMFiT against XNLI

cstorm125 commented 5 years ago

Working on this for Thai with a new network trained on QRNN. I got worse perplexity on Wiki but I think it might not matter much.

PiotrCzapla commented 5 years ago

Superb! @cstorm125 let me add you to this repo so you can write directly.

PiotrCzapla commented 5 years ago

@cstorm125 can you describe how you want to have this implemented?

sebastianruder commented 5 years ago

Thanks for opening this issue. I think setting up the data processing and general evaluation on XNLI will be necessary for submitting the paper and should be largely independent of the other things we're doing.

NirantK commented 5 years ago

I've started setting up basic data processing for XNLI at a new branch: https://github.com/n-waves/ulmfit-multilingual/tree/xnli

e.g. XNLI download script here

If I understand correctly, the goal is to get it to the point where we can use pretrain_lm with XNLI in the same way we use it with Wikidumps?

sebastianruder commented 5 years ago

Thanks, Nirant! The goal would be to fine-tune the pretrained language model and train the classifier on the XNLI data and to then evaluate on it, so that we can compare to multilingual BERT. Does that make sense?

PiotrCzapla commented 5 years ago

@sebastianruder how do you want to differentiate between the text and the hypothesis? Using different xxfld, or using separate RNNs one to read the text and another to read hypothesis?

sebastianruder commented 5 years ago

I think the best thing to try is:

Use a special delimiter token between the premise and the hypothesis and treat premise + delimiter + hypothesis as a single input as we would for classification (this is what is done in the OpenAI paper). We could also run the RNN over the premise and the hypothesis separately and then feed in the concatenation of the sum, the difference, and the product of their representations as input to a classifier (this is done in a lot of entailment papers), but I think this would perform worse than the first approach as we don't have any interaction between hypothesis and premise.

PiotrCzapla commented 5 years ago

Ok if we want to use fields we should most likely fix the order in which fields appear in the backward pass as currently, they appear as xxfld 1 text in froward LM and text 1 xxfld in backward LM.

sebastianruder commented 5 years ago

I think it'd be good if we could get rid of fields in favour of <bos> (beginning of sentence/example) and <eos> (end of sentence/example) tokens, which are more standard in the literature. We're already adding <eos> tokens to the vocabulary to replace the newlines in the Wikipedia data, so it seems we just need to add the <bos> tokens.

PiotrCzapla commented 5 years ago

so you would like to have <bos> premise text <eos> <bos> hypothesis <eos>? How would it work in backward LM? BERT is adding an additional trainable vector to all words in the second sentence, I though they do that because it works better than what openai suggested.

But I think we can experiment with a few different markups.

sebastianruder commented 5 years ago

Basically, except we'd add another special token for entailment, so that the input would be: <bos> premise text <eos> <sep> <bos> hypothesis <eos> Representations for <bos> and <eos> should already be learned during language modeling. During fine-tuning, we then only need to learn an embedding for <sep>. Yeah, in their case they also pretrain the segment embeddings for sentence A and sentence B with their next sentence prediction task. As we don't do that, I'm not sure if this will be better than the first approach, but it's worth a try.

PiotrCzapla commented 5 years ago

Yeah, in their case they also pre-train the segment embeddings for sentence A and sentence B with their next sentence prediction task

Nothing stops us from doing this as an intermediate step, that way <sep> would also be learned. But that for later.

sebastianruder commented 5 years ago

Agreed. It'd be cool if we could try this later. That way, we could also position our approach as allowing faster experimentation with such things compared to BERT.

NirantK commented 5 years ago

Context: I am trying to use the train_clas to fine tune and classify for XNLI-English. I am using the read_xnli from fastai_contrib.

I am able to fine tune the Wikitext-103 LM from fastai for 2 epochs with the following:

$python ulmfit/train_clas.py --data_dir=./data --dataset=xnli --bs=20
...
2      4.168341    4.187915    0.326962
Starting classifier training
epoch  train_loss  valid_loss  accuracy
... ... ... ...

At the classifier step, (when trying to validate?) at the end of first epoch:

Traceback (most recent call last):
  File "ulmfit/train_clas.py", line 162, in <module>
  File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "ulmfit/train_clas.py", line 145, in new_train_clas

  File "/home/nirant/fastai/fastai/train.py", line 20, in fit_one_cycle
    learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
  File "/home/nirant/fastai/fastai/basic_train.py", line 162, in fit
    callbacks=self.callbacks+callbacks)
  File "/home/nirant/fastai/fastai/basic_train.py", line 94, in fit
    raise e
  File "/home/nirant/fastai/fastai/basic_train.py", line 89, in fit
    cb_handler=cb_handler, pbar=pbar)
  File "/home/nirant/fastai/fastai/basic_train.py", line 49, in validate
    for xb,yb in progress_bar(dl, parent=pbar, leave=(pbar is not None)):
  File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/fastprogress/fastprogress.py", line 65, in __iter__
    for i,o in enumerate(self._gen):
  File "/home/nirant/fastai/fastai/basic_data.py", line 47, in __iter__
    for b in self.dl:
  File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 615, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/home/nirant/fastai/fastai/text/data.py", line 92, in pad_collate
    return res, tensor([s[1] for s in samples])
  File "/home/nirant/fastai/fastai/torch_core.py", line 68, in tensor
    return torch.tensor(x) if is_listy(x) else as_tensor(x)
RuntimeError: Could not infer dtype of NoneType

This happens at the fit_one_cycle for classification. Is this because of the vocabulary mismatch? Or because we do not update automatically from 2-predicted classes as in imdb to 3-class as in XNLI? What are other possible points of error which I should check?

sebastianruder commented 5 years ago

Another source of error might be that we're not mapping the XNLI labels to int at the moment.

NirantK commented 5 years ago

Would that not fail on first iteration of the classifier if the our mapping is not accepted by the fit_one_cycle?

abedkhooli commented 5 years ago

I tried to test on Arabic model and got poor results then tried English using WT103_1 model and it didn't work well either. Not sure what's wrong. Code for En here. I used 2 cols (premise and hypo) with mark_fields=True.

PiotrCzapla commented 5 years ago

Guys, can anyone have a look at refactoring branch and try to implement a working XNLI that respect recent changes in Fastai, so we get a baseline that we can later try to improve upon?

PiotrCzapla commented 4 years ago

Let me close that for the time as it is unlikely we will play with XNLI anytime soon

n-waves / multifit

Basic support for testing ULMFiT against XNLI #5