Memory leakage in BERT example

🐛 Describe the bug

I attempted to run the BERT example on two GPUs in a single node using following command: torchrun --nproc_per_node 1 --master_addr localhost --master_port 29500 train.py

However, the allocated device memory inflates as training proceeds.

After a brief check, I found theres 500 new tensors are created every 10 iterations.

Logs shown below:

colossalai - apex.transformer.tensor_parallel - 2022-03-20 13:54:59,556 WARNING: `fused_weight_gradient_mlp_cuda` module not found. gradient accumulation fusion with weight gradient computation disabled.
colossalai - torch.distributed.distributed_c10d - 2022-03-20 13:54:59,631 INFO: Added key: store_based_barrier_key:1 to store for rank: 0
colossalai - torch.distributed.distributed_c10d - 2022-03-20 13:54:59,631 INFO: Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
colossalai - torch.distributed.distributed_c10d - 2022-03-20 13:54:59,631 INFO: Added key: store_based_barrier_key:2 to store for rank: 0
colossalai - torch.distributed.distributed_c10d - 2022-03-20 13:54:59,631 INFO: Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 1 nodes.
colossalai - torch.distributed.distributed_c10d - 2022-03-20 13:54:59,631 INFO: Added key: store_based_barrier_key:3 to store for rank: 0
colossalai - torch.distributed.distributed_c10d - 2022-03-20 13:54:59,632 INFO: Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 1 nodes.
colossalai - torch.distributed.distributed_c10d - 2022-03-20 13:54:59,632 INFO: Added key: store_based_barrier_key:4 to store for rank: 0
colossalai - torch.distributed.distributed_c10d - 2022-03-20 13:54:59,632 INFO: Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 1 nodes.
colossalai - torch.distributed.distributed_c10d - 2022-03-20 13:54:59,632 INFO: Added key: store_based_barrier_key:5 to store for rank: 0
colossalai - torch.distributed.distributed_c10d - 2022-03-20 13:54:59,632 INFO: Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 1 nodes.
colossalai - torch.distributed.distributed_c10d - 2022-03-20 13:54:59,632 INFO: Added key: store_based_barrier_key:6 to store for rank: 0
colossalai - torch.distributed.distributed_c10d - 2022-03-20 13:54:59,632 INFO: Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 1 nodes.
colossalai - colossalai - 2022-03-20 13:54:59,634 INFO: process rank 0 is bound to device 0
colossalai - colossalai - 2022-03-20 13:54:59,635 INFO: initialized seed on rank 0, numpy: 1234, python random: 1234, ParallelMode.DATA: 1234, ParallelMode.TENSOR: 1234,the default parallel seed is ParallelMode.DATA.
colossalai - colossalai - 2022-03-20 13:54:59,635 INFO: Distributed environment is initialized, data parallel size: 1, pipeline parallel size: 1, tensor parallel size: 1
> building BertWordPieceLowerCase tokenizer ...
 > padded vocab (size: 30524) with 68 dummy tokens (new size: 30592)
colossalai - colossalai - 2022-03-20 13:54:59,658 INFO: > building train, validation, and test datasets ...
colossalai - colossalai - 2022-03-20 13:54:59,658 INFO:  > datasets target sizes (minimum size):
colossalai - colossalai - 2022-03-20 13:54:59,658 INFO:     train:      32000000
colossalai - colossalai - 2022-03-20 13:54:59,658 INFO:     validation: 32000320
colossalai - colossalai - 2022-03-20 13:54:59,658 INFO:     test:       320
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
colossalai - colossalai - 2022-03-20 13:54:59,665 INFO: 
 > building dataset index ...
colossalai - colossalai - 2022-03-20 13:54:59,665 INFO: 
 > finished creating indexed dataset in 0.006665 seconds
colossalai - colossalai - 2022-03-20 13:54:59,665 INFO: 
 > indexed dataset stats:
    number of documents: 6409572
    number of sentences: 128198975
colossalai - colossalai - 2022-03-20 13:54:59,665 INFO: 
 > dataset split:
colossalai - colossalai - 2022-03-20 13:54:59,665 INFO: 
    train:
     document indices in [0, 6082683) total of 6082683 documents
     sentence indices in [0, 123690635) total of 123690635 sentences
colossalai - colossalai - 2022-03-20 13:54:59,665 INFO: 
    validation:
     document indices in [6082683, 6403162) total of 320479 documents
     sentence indices in [123690635, 128115537) total of 4424902 sentences
colossalai - colossalai - 2022-03-20 13:54:59,667 INFO: 
    test:
     document indices in [6403162, 6409572) total of 6410 documents
     sentence indices in [128115537, 128198975) total of 83438 sentences
colossalai - colossalai - 2022-03-20 13:55:03,351 INFO: 
 > loading indexed mapping from /work/workspace/MOE-ColossalAI/Megatron-LM/my-bert_text_sentence_train_indexmap_32000000mns_125msl_0.10ssp_1234s.npy
    loaded indexed file in 0.014 seconds
    total number of samples: 50551630
colossalai - colossalai - 2022-03-20 13:55:03,363 INFO: 
 > loading indexed mapping from /work/workspace/MOE-ColossalAI/Megatron-LM/my-bert_text_sentence_valid_indexmap_32000320mns_125msl_0.10ssp_1234s.npy
    loaded indexed file in 0.011 seconds
    total number of samples: 32197223
colossalai - colossalai - 2022-03-20 13:55:03,365 INFO: 
 > loading indexed mapping from /work/workspace/MOE-ColossalAI/Megatron-LM/my-bert_text_sentence_test_indexmap_320mns_125msl_0.10ssp_1234s.npy
    loaded indexed file in 0.001 seconds
    total number of samples: 17447
colossalai - colossalai - 2022-03-20 13:55:03,742 INFO: Dataloaders are built
colossalai - colossalai - 2022-03-20 13:55:07,958 INFO: Model is built with softmax in fp32 = True
colossalai - colossalai - 2022-03-20 13:55:07,958 INFO: This model has 38392960 parameters
colossalai - colossalai - 2022-03-20 13:55:07,958 INFO: Criterion is built
colossalai - colossalai - 2022-03-20 13:55:07,958 INFO: without weight decay param: 22, with weight decay param: 11
colossalai - colossalai - 2022-03-20 13:55:07,960 INFO: Optimizer is built
colossalai - colossalai - 2022-03-20 13:55:07,960 INFO: LR Scheduler is built with 9900 warmup steps and 990000 decay steps
colossalai - colossalai - 2022-03-20 13:55:07,962 INFO: 
========== Your Config ========
{'ADD_BINARY_HEAD': False,
 'DATA_PATH': '/work/workspace/MOE-ColossalAI/Megatron-LM/my-bert_text_sentence',
 'DECAY_ITERS': 990000,
 'DEPTH': 2,
 'EVAL_INTERVAL': 10,
 'EVAL_ITERS': 10,
 'GLOBAL_BATCH_SIZE': 32,
 'HIDDEN_SIZE': 768,
 'LR': 0.0001,
 'MIN_LR': 1e-05,
 'NUM_ATTENTION_HEADS': 2,
 'NUM_MICRO_BATCHES': 4,
 'SEED': 1234,
 'SEQ_LENGTH': 128,
 'TRAIN_ITERS': 1000000,
 'VOCAB_FILE_PATH': '/work/workspace/MOE-ColossalAI/vocab/bert-large-uncased-vocab.txt',
 'WARMUP_FRACTION': 0.01,
 'WEIGHT_DECAY': 0.01,
 'clip_grad_norm': 1.0,
 'fp16': {'log_num_zeros_in_grad': True,
          'mode': <AMP_TYPE.NAIVE: 'naive'>,
          'verbose': True},
 'gradient_handler': [{'type': 'SequenceParallelGradientHandler'}],
 'parallel': {'pipeline': 1, 'tensor': {'mode': 'sequence', 'size': 1}}}
================================

colossalai - colossalai - 2022-03-20 13:55:07,962 INFO: cuDNN benchmark = True, deterministic = False
colossalai - colossalai - 2022-03-20 13:55:07,985 INFO: 
=========  FP16 Optimizer Config =========
Optimizer: FusedAdam
clip_grad = 1.0
log_num_zeros_in_grad = True
initial_scale = 4294967296
min_scale = 1
growth_factor = 2
backoff_factor = 0.5
growth_interval = 1000
hysteresis = 2
==========================================
colossalai - colossalai - 2022-03-20 13:55:09,886 INFO: overflow occurs, loss scale is adjusted to tensor([4.2950e+09], device='cuda:0')
colossalai - colossalai - 2022-03-20 13:55:09,915 INFO: overflow occurs, loss scale is adjusted to tensor([2.1475e+09], device='cuda:0')
colossalai - colossalai - 2022-03-20 13:55:09,941 INFO: overflow occurs, loss scale is adjusted to tensor([1.0737e+09], device='cuda:0')
colossalai - colossalai - 2022-03-20 13:55:09,966 INFO: overflow occurs, loss scale is adjusted to tensor([5.3687e+08], device='cuda:0')
colossalai - colossalai - 2022-03-20 13:55:09,991 INFO: overflow occurs, loss scale is adjusted to tensor([2.6844e+08], device='cuda:0')
colossalai - colossalai - 2022-03-20 13:55:10,184 INFO: overflow occurs, loss scale is adjusted to tensor([1.3422e+08], device='cuda:0')
colossalai - colossalai - 2022-03-20 13:55:10,339 INFO: overflow occurs, loss scale is adjusted to tensor([67108864.], device='cuda:0')
colossalai - colossalai - 2022-03-20 13:55:10,363 INFO: overflow occurs, loss scale is adjusted to tensor([33554432.], device='cuda:0')
colossalai - colossalai - 2022-03-20 13:55:10,476 INFO: overflow occurs, loss scale is adjusted to tensor([16777216.], device='cuda:0')
colossalai - colossalai - 2022-03-20 13:55:10,504 INFO: overflow occurs, loss scale is adjusted to tensor([8388608.], device='cuda:0')
colossalai - colossalai - 2022-03-20 13:55:11,092 INFO: Step 10 / 1000000 | Train Loss: 10.504 | Eval Loss: 10.486 | Grad Norm: None | Skipped Iterations: 10 | Loss Scale: 8388608.0| Learning rate: 0.0 | Num Zero in Grad: None | train-iterations: 251.09575
colossalai - colossalai - 2022-03-20 13:55:11,117 INFO: overflow occurs, loss scale is adjusted to tensor([4194304.], device='cuda:0')
colossalai - colossalai - 2022-03-20 13:55:11,141 INFO: overflow occurs, loss scale is adjusted to tensor([2097152.], device='cuda:0')
colossalai - colossalai - 2022-03-20 13:55:11,166 INFO: overflow occurs, loss scale is adjusted to tensor([1048576.], device='cuda:0')
colossalai - colossalai - 2022-03-20 13:55:11,190 INFO: overflow occurs, loss scale is adjusted to tensor([524288.], device='cuda:0')
colossalai - colossalai - 2022-03-20 13:55:11,434 INFO: overflow occurs, loss scale is adjusted to tensor([262144.], device='cuda:0')
colossalai - colossalai - 2022-03-20 13:55:12,321 INFO: Step 20 / 1000000 | Train Loss: 10.501 | Eval Loss: 10.491 | Grad Norm: 8.805602073669434 | Skipped Iterations: 5 | Loss Scale: 262144.0| Learning rate: 5.0505050505050506e-08 | Num Zero in Grad: 843 | train-iterations: 66.22717
colossalai - colossalai - 2022-03-20 13:55:12,664 INFO: overflow occurs, loss scale is adjusted to tensor([131072.], device='cuda:0')
colossalai - colossalai - 2022-03-20 13:55:13,665 INFO: Step 30 / 1000000 | Train Loss: 10.484 | Eval Loss: 10.491 | Grad Norm: 9.763409614562988 | Skipped Iterations: 1 | Loss Scale: 131072.0| Learning rate: 1.4141414141414141e-07 | Num Zero in Grad: 822 | train-iterations: 66.48097
colossalai - colossalai - 2022-03-20 13:55:15,080 INFO: Step 40 / 1000000 | Train Loss: 10.494 | Eval Loss: 10.481 | Grad Norm: 8.260552406311035 | Skipped Iterations: 0 | Loss Scale: 131072.0| Learning rate: 2.4242424242424244e-07 | Num Zero in Grad: 845 | train-iterations: 68.68193
colossalai - colossalai - 2022-03-20 13:55:16,482 INFO: Step 50 / 1000000 | Train Loss: 10.471 | Eval Loss: 10.438 | Grad Norm: 8.306716918945312 | Skipped Iterations: 0 | Loss Scale: 131072.0| Learning rate: 3.4343434343434344e-07 | Num Zero in Grad: 840 | train-iterations: 71.22791
colossalai - colossalai - 2022-03-20 13:55:17,696 INFO: Step 60 / 1000000 | Train Loss: 10.427 | Eval Loss: 10.367 | Grad Norm: 9.02975845336914 | Skipped Iterations: 0 | Loss Scale: 131072.0| Learning rate: 4.444444444444445e-07 | Num Zero in Grad: 833 | train-iterations: 66.13533
colossalai - colossalai - 2022-03-20 13:55:18,963 INFO: Step 70 / 1000000 | Train Loss: 10.387 | Eval Loss: 10.293 | Grad Norm: 9.127448081970215 | Skipped Iterations: 0 | Loss Scale: 131072.0| Learning rate: 5.454545454545455e-07 | Num Zero in Grad: 839 | train-iterations: 63.39867
colossalai - colossalai - 2022-03-20 13:55:20,199 INFO: Step 80 / 1000000 | Train Loss: 10.304 | Eval Loss: 10.242 | Grad Norm: 9.513465881347656 | Skipped Iterations: 0 | Loss Scale: 131072.0| Learning rate: 6.464646464646465e-07 | Num Zero in Grad: 824 | train-iterations: 64.48858
colossalai - colossalai - 2022-03-20 13:55:21,562 INFO: Step 90 / 1000000 | Train Loss: 10.26 | Eval Loss: 10.16 | Grad Norm: 8.393514633178711 | Skipped Iterations: 0 | Loss Scale: 131072.0| Learning rate: 7.474747474747475e-07 | Num Zero in Grad: 834 | train-iterations: 71.18421
colossalai - colossalai - 2022-03-20 13:55:22,871 INFO: Step 100 / 1000000 | Train Loss: 10.176 | Eval Loss: 10.093 | Grad Norm: 8.5743408203125 | Skipped Iterations: 0 | Loss Scale: 131072.0| Learning rate: 8.484848484848486e-07 | Num Zero in Grad: 838 | train-iterations: 64.65802
colossalai - colossalai - 2022-03-20 13:55:24,213 INFO: Step 110 / 1000000 | Train Loss: 10.094 | Eval Loss: 9.9949 | Grad Norm: 8.43807315826416 | Skipped Iterations: 0 | Loss Scale: 131072.0| Learning rate: 9.494949494949495e-07 | Num Zero in Grad: 835 | train-iterations: 64.54813
colossalai - colossalai - 2022-03-20 13:55:25,504 INFO: Step 120 / 1000000 | Train Loss: 10.004 | Eval Loss: 9.9296 | Grad Norm: 7.786318778991699 | Skipped Iterations: 0 | Loss Scale: 131072.0| Learning rate: 1.0505050505050506e-06 | Num Zero in Grad: 844 | train-iterations: 63.54368
colossalai - colossalai - 2022-03-20 13:55:26,700 INFO: Step 130 / 1000000 | Train Loss: 9.9541 | Eval Loss: 9.8156 | Grad Norm: 7.1489057540893555 | Skipped Iterations: 0 | Loss Scale: 131072.0| Learning rate: 1.1515151515151516e-06 | Num Zero in Grad: 841 | train-iterations: 62.47182
colossalai - colossalai - 2022-03-20 13:55:28,004 INFO: Step 140 / 1000000 | Train Loss: 9.868 | Eval Loss: 9.783 | Grad Norm: 6.379231929779053 | Skipped Iterations: 0 | Loss Scale: 131072.0| Learning rate: 1.2525252525252527e-06 | Num Zero in Grad: 844 | train-iterations: 64.96000
Traceback (most recent call last):
  File "/work/workspace/MOE-ColossalAI/sequene_parallel/train.py", line 267, in <module>
    main()
  File "/work/workspace/MOE-ColossalAI/sequene_parallel/train.py", line 197, in main
    lm_loss, sop_output = engine(tokens, padding_mask, types, lm_labels)
  File "/work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/lib/python3.9/site-packages/colossalai/engine/_base_engine.py", line 127, in __call__
    return self.model(*args, **kwargs)
  File "/work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/lib/python3.9/site-packages/colossalai/amp/naive_amp/naive_amp.py", line 74, in forward
    out = self.model(*args, **kwargs)
  File "/work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/work/workspace/MOE-ColossalAI/sequene_parallel/model/bert.py", line 117, in forward
    return self.head(output, self.embedding.word_embedding_weight, lm_labels)
  File "/work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/work/workspace/MOE-ColossalAI/sequene_parallel/model/layers/head.py", line 77, in forward
    lm_loss = self.lm_head(hidden_states, word_embeddings_weight, lm_labels)
  File "/work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/work/workspace/MOE-ColossalAI/sequene_parallel/model/layers/head.py", line 39, in forward
    output = F.linear(hidden_states, word_embeddings_weight, self.bias)
  File "/work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/lib/python3.9/site-packages/torch/nn/functional.py", line 1848, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA out of memory. Tried to allocate 240.00 MiB (GPU 0; 39.59 GiB total capacity; 36.40 GiB already allocated; 204.19 MiB free; 37.32 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/lib/python3.9/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 84857) of binary: /work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/bin/python
Traceback (most recent call last):
  File "/work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==1.10.1', 'console_scripts', 'torchrun')())
  File "/work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/lib/python3.9/site-packages/torch/distributed/run.py", line 719, in main
    run(args)
  File "/work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/lib/python3.9/site-packages/torch/distributed/run.py", line 710, in run
    elastic_launch(
  File "/work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/work/workspace/intel/oneapi/intelpython/latest/envs/autoaug/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2022-03-20_13:55:33
  host      : inspur-4
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 84857)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Environment

Colossal-AI version: 0.0.2 PyTorch version: 1.10.1 Is debug build: False CUDA used to build PyTorch: 11.3 ROCM used to build PyTorch: N/A

OS: CentOS Linux 7 (Core) (x86_64) GCC version: (GCC) 7.5.0 Clang version: Could not collect CMake version: version 3.19.6 Libc version: glibc-2.17

Python version: 3.9.6 (default, Aug 18 2021, 19:38:01) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-3.10.0-1062.el7.x86_64-x86_64-with-glibc2.17 Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: A100-PCIE-40GB GPU 1: A100-PCIE-40GB

Nvidia driver version: 460.27.04 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] efficientnet-pytorch==0.6.3 [pip3] numpy==1.20.3 [pip3] pytorch-lightning==1.1.4 [pip3] pytorch-nlp==0.5.0 [pip3] segmentation-models-pytorch==0.2.0 [pip3] torch==1.10.1 [pip3] torchaudio==0.10.1 [pip3] torchio==0.18.50 [pip3] torchmetrics==0.5.0 [pip3] torchtext==0.11.1 [pip3] torchvision==0.11.2 [conda] blas 1.0 mkl defaults [conda] cudatoolkit 11.3.1 h2bc3f7f_2 defaults [conda] efficientnet-pytorch 0.6.3 pypi_0 pypi [conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] mkl 2021.3.0 h06a4308_520 defaults [conda] mkl-service 2.4.0 py39h7f8727e_0 defaults [conda] mkl_fft 1.3.0 py39h42c9631_2 defaults [conda] mkl_random 1.2.2 py39h51133e4_0 defaults [conda] numpy 1.20.3 py39hf144106_0 defaults [conda] numpy-base 1.20.3 py39h74d4b33_0 defaults [conda] pytorch 1.10.1 py3.9_cuda11.3_cudnn8.2.0_0 pytorch [conda] pytorch-lightning 1.1.4 pypi_0 pypi [conda] pytorch-mutex 1.0 cuda pytorch [conda] pytorch-nlp 0.5.0 pypi_0 pypi [conda] segmentation-models-pytorch 0.2.0 pypi_0 pypi [conda] torch 1.9.0 pypi_0 pypi [conda] torchaudio 0.10.1 py39_cu113 pytorch [conda] torchio 0.18.50 pypi_0 pypi [conda] torchmetrics 0.5.0 pypi_0 pypi [conda] torchtext 0.11.1 pypi_0 pypi [conda] torchvision 0.11.2 py39_cu113 pytorch

hpcaitech / ColossalAI-Examples

Memory leakage in BERT example #50

🐛 Describe the bug

Environment