sillsdev / machine.py

Machine is a natural language processing library for Python that is focused on providing tools for processing resource-poor languages.
MIT License
10 stars 2 forks source link

Out Of Memory - very long input lengths? #40

Closed johnml1135 closed 10 months ago

johnml1135 commented 11 months ago

https://app.sil.hosted.allegro.ai/projects/*/experiments/e0a9364cae5f4cb0b31a0237d5dc6440/info-output/log?columns=selected&columns=type&columns=name&columns=tags&columns=status&columns=project.name&columns=users&columns=started&columns=last_update&columns=last_iteration&columns=parent.name&order=-started&filter=&deep=true

[WARNING|text2text_generation.py:307] 2023-10-10 15:56:27,604 >> Your input_length: 1024 is bigger than 0.9 * max_length: 200. You might consider increasing your max_length manually, e.g. translator('...', max_length=400)
[INFO|configuration_utils.py:577] 2023-10-10 15:56:27,605 >> Generate config GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 0,
  "decoder_start_token_id": 2,
  "eos_token_id": 2,
  "forced_bos_token_id": 256047,
  "max_length": 200,
  "pad_token_id": 1,
  "transformers_version": "4.29.1"
}
2023-10-10 15:56:34
2023-10-10 15:56:29,603 - machine.jobs.build_nmt_engine - ERROR - CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 23.68 GiB total capacity; 22.99 GiB already allocated; 6.69 MiB free; 23.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/machine/jobs/build_nmt_engine.py", line 54, in run
    job.run(check_canceled)
  File "/usr/local/lib/python3.8/dist-packages/machine/jobs/nmt_engine_build_job.py", line 66, in run
    _translate_batch(model, pi_batch, writer)
  File "/usr/local/lib/python3.8/dist-packages/machine/jobs/nmt_engine_build_job.py", line 75, in _translate_batch
    for i, result in enumerate(engine.translate_batch(source_segments)):
  File "/usr/local/lib/python3.8/dist-packages/machine/translation/huggingface/hugging_face_nmt_engine.py", line 55, in translate_batch
    return [results[0] for results in self.translate_n_batch(1, segments)]
  File "/usr/local/lib/python3.8/dist-packages/machine/translation/huggingface/hugging_face_nmt_engine.py", line 64, in translate_n_batch
    self._pipeline(segments, num_return_sequences=n),
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/text2text_generation.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/text2text_generation.py", line 165, in __call__
    result = super().__call__(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py", line 1100, in __call__
    outputs = list(final_iterator)
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
  File "/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py", line 1025, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/usr/local/lib/python3.8/dist-packages/machine/translation/huggingface/hugging_face_nmt_engine.py", line 141, in _forward
    scores = tuple(torch.nn.functional.log_softmax(logits, dim=-1) for logits in output.scores)
  File "/usr/local/lib/python3.8/dist-packages/machine/translation/huggingface/hugging_face_nmt_engine.py", line 141, in <genexpr>
    scores = tuple(torch.nn.functional.log_softmax(logits, dim=-1) for logits in output.scores)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 1932, in log_softmax
    ret = input.log_softmax(dim)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 23.68 GiB total capacity; 22.99 GiB already allocated; 6.69 MiB free; 23.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Stack (most recent call last):
  File "/root/.clearml/venvs-builds/3.8/code/untitled.py", line 12, in <module>
    run(args)
  File "/usr/local/lib/python3.8/dist-packages/machine/jobs/build_nmt_engine.py", line 57, in run
    logger.exception(e, stack_info=True)

Does it have something to do with the input length being very long?

ddaspit commented 11 months ago

The number of tokens should be truncated to 200, so the length isn't the core issue. This error is occurring, because it is running on John's RTX 3090, which only has 24gb of memory. The num_beams should be dropped to 1. If that doesn't fix it, then the batch_size can be decreased as well.

johnml1135 commented 11 months ago

We can drop it to num_beams to 1 in https://github.com/sillsdev/serval/issues/178.

johnml1135 commented 10 months ago

num_beams was dropped to 1 and the model chosen was 600_distilled for ext-qa based upon the staging environment defaults.