utterworks / fast-bert

Super easy library for BERT based NLP models
Apache License 2.0
1.86k stars 341 forks source link

notebook not working out of the box #8

Closed mschmill closed 5 years ago

mschmill commented 5 years ago

I'm trying to just get the included toxicity notebook to work from a fresh clone and am having some issues:

  1. Out of the box, the data & labels directory are pointing to the wrong place and the DataBunch is using filenames that are not part of the repo. These are fixed easily enough.

  2. It would help if there was a pointer to where to get the PyTorch pretrained model uncased_L-12_H-768_A-12. There is a Google download which will not work with the from_pretrained_model cell:

    FileNotFoundError: [Errno 2] No such file or directory: '../../bert/bert-models/uncased_L-12_H-768_A-12/pytorch_model.bin'

    I have been able to get past this step by instead of using 'bert-base-uncased' instead of BERT_PRETRAINED_PATH as the model spec in the tokenizer and from_pretrained_model steps.

  3. Once I get everything loaded, RuntimeError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 7.43 GiB total capacity; 6.91 GiB already allocated; 10.94 MiB free; 24.36 MiB cached)

This is a standard 8G GPU compute engine instance on GCP. Advice on how to not run out of memory would help the tutorial a lot.

mschmill commented 5 years ago

Also, I seem to be getting different predictions for the learned model and one that is realoaded as a predictor. The following code cell, added to the bottom of the sample notebook.

from fast_bert.prediction import BertClassificationPredictor

texts = [ "this is strange." ]

# the learned model
print(learner.predict_batch(texts))
learner.save_and_reload(MODEL_PATH, "toxic-example-classifier")

# the learned model restored as a predictor
predictor = BertClassificationPredictor(model_path=MODEL_PATH/"toxic-example-classifier.bin", pretrained_path='bert-base-uncased', 
                                        label_path=LABEL_PATH, multi_label=False)
print(predictor.predict_batch(texts))

produces the following output:

[[('toxic', 0.05682373046875), ('insult', 0.033843994140625), ('obscene', 0.0301055908203125), ('threat', 0.026397705078125), ('severe_toxic', 0.022064208984375), ('identity_hate', 0.0198822021484375)]]

05/24/2019 01:00:03 - INFO - pytorch_pretrained_bert.modeling -   loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz from cache at /home/schmillm_ae_com/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
05/24/2019 01:00:03 - INFO - pytorch_pretrained_bert.modeling -   extracting archive file /home/schmillm_ae_com/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba to temp dir /tmp/tmpgw4vq6hi
05/24/2019 01:00:07 - INFO - pytorch_pretrained_bert.modeling -   Model config {
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "type_vocab_size": 2,
  "vocab_size": 30522
}

05/24/2019 01:00:12 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/schmillm_ae_com/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
05/24/2019 01:00:12 - INFO - pytorch_pretrained_bert.modeling -   loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz from cache at /home/schmillm_ae_com/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
05/24/2019 01:00:12 - INFO - pytorch_pretrained_bert.modeling -   extracting archive file /home/schmillm_ae_com/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba to temp dir /tmp/tmprltaxjan
05/24/2019 01:00:16 - INFO - pytorch_pretrained_bert.modeling -   Model config {
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "type_vocab_size": 2,
  "vocab_size": 30522
}

[[('toxic', 0.30697527527809143), ('insult', 0.17855168879032135), ('obscene', 0.15811719000339508), ('threat', 0.13821621239185333), ('severe_toxic', 0.11494892090559006), ('identity_hate', 0.10319076478481293)]]
kaushaltrivedi commented 5 years ago

You will have to pass multi-label as True while instantiating the predictor as this is a multilabel problem. Looking at the output of learner, you have trained a multilabel classifier.

On the cuda out of memory, just reduce the batch size.

Pawel-Kranzberg commented 5 years ago

Regarding "out of memory", I set max_seq_length = 64 and successfully run on 8GB GTX1070.

kuruparan commented 5 years ago

FileNotFoundError: [Errno 2] No such file or directory: '/home/kuru/Desktop/uncased_L-12_H-768_A-12/pytorch_model.bin'

how did you guys overcome this setp

Mandark27 commented 5 years ago

learner = BertLearner.from_pretrained_model(databunch, 'https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz', metrics, device, logger=None, finetuned_wgts_path=None, is_fp16=args['fp16'], loss_scale=args['loss_scale'], multi_gpu=multi_gpu, multi_label=True)

This is what I did to load the pre-trained model.

kuruparan commented 5 years ago

thanks. anyone got issue with apex installation ? .it would be helpfull if you kindly share link links for anaconda apex that supports fast-bert

Mandark27 commented 5 years ago

!git clone https://github.com/NVIDIA/apex.git %cd apex !python setup.py install --cuda_ext --cpp_ext

Pawel-Kranzberg commented 5 years ago

@kuruparan - You can also just set 'bert-base-uncased' as the value of BERT_PRETRAINED_PATH (or put 'bert-base-uncased' directly in your learner definition instead of the AWS link). The pytorch_pretrained_bert library handles download and caching of the pretrained model - see https://github.com/huggingface/pytorch-pretrained-BERT#doc

Pawel-Kranzberg commented 5 years ago

@kuruparan - https://github.com/NVIDIA/apex#quick-start

Mandark27 commented 5 years ago

I am unable to save databunch. It throws this error unsupported operand type(s) for /: 'str' and 'str' python at this line tmp_path = self.data_dir/'tmp'.

oltip commented 5 years ago

First of all, really a great job. Thank you!

I am having an issue with the DistributedSampler in my local machine. as it was raising an error: RuntimeError("Requires distributed package to be available")

What I was trying to do was the following: train_sampler = DistributedSampler(train_data, num_replicas = 1, rank=1)
Does such a change cause any problems when training? Should I change anything in the batch size?

To overcome this issue, could it be a better idea to specify shuffle = True instead of specifying sampler argument when defining the DataLoader object?

Pawel-Kranzberg commented 5 years ago

Do you have Apex installed? What GPU are you using?

On Thu, 30 May 2019, 19:04 oltip, notifications@github.com wrote:

First of all, really a great job. Thank you!

I am having an issue with the DistributedSampler in my local machine.

What an trying to do is the following: DistributedSampler(train_data, num_replicas = 1, rank=1)

as it was raising an error: RuntimeError("Requires distributed package to be available")

Does such a change cause any problems when training? Should I change anything in the batch size?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaushaltrivedi/fast-bert/issues/8?email_source=notifications&email_token=AB4WCYS66RAJIURJL5HTPYTPYACINA5CNFSM4HPJ2L22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWS4WLQ#issuecomment-497404718, or mute the thread https://github.com/notifications/unsubscribe-auth/AB4WCYULPICW4QEH7K3ECU3PYACINANCNFSM4HPJ2L2Q .

oltip commented 5 years ago

Anyway, I solved that ===>maybe others might encounter the same issue. The point was that the pytorch v1.0.1 in windows doesn't have the distribution stuff. I moved to ubuntu 18.4 and it works.

DanyalAndriano commented 5 years ago

I encountered the apex error in windows - this solved it https://github.com/NVIDIA/apex/issues/429.

However, I am running into issues now with RuntimeError: CUDA out of memory. even after changing max_seq_length = 64. Also odd is that I am getting this error now when creating the learner object. Previously it was when I was fitting the learner.

Pawel-Kranzberg commented 4 years ago

I encountered the apex error in windows - this solved it NVIDIA/apex#429.

However, I am running into issues now with RuntimeError: CUDA out of memory. even after changing max_seq_length = 64. Also odd is that I am getting this error now when creating the learner object. Previously it was when I was fitting the learner.

How much memory does your GPU have?

DanyalAndriano commented 4 years ago

I managed to solve this issue by restarting my jupyter kernel before running the model. I also used the gradient_accumulation = 8 setting + batch_size = 8 to get batches of 64 that fit into my GPU memory. I was able to keep sequence length ar 256 with these settings.

DanyalAndriano commented 4 years ago

If I understand correctly, I'm using Bert base uncased and it worked for both multilabel and multiclass.

On Thu, 21 Nov 2019 at 04:54, Enzo Lebrun notifications@github.com wrote:

I managed to solve this issue by restarting my jupyter kernel before running the model. I also used the gradient_accumulation = 8 setting + batch_size = 8 to get batches of 64 that fit into my GPU memory. I was able to keep sequence length ar 256 with these settings.

Which method is using those arguments?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaushaltrivedi/fast-bert/issues/8?email_source=notifications&email_token=AJ3NSBHXQFSHSAJDFGICLF3QUZLETA5CNFSM4HPJ2L22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEZUEPI#issuecomment-557007421, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ3NSBFVKF5VHOZZ7PX6MJ3QUZLETANCNFSM4HPJ2L2Q .