[bug] retrieverタスクでmulti-GPU使用時にCUDA OOMが発生する

Mr. Tydiの評価時に下記のようなエラーが出る。条件

A100 x 8
embedder.max_seq_length 512
TransformerEmbedderを使用

**ank7]:     metrics = evaluator(text_embedder, cache_dir=cache_dir, overwrite_cache=overwrite_cache)
[rank7]:   File "/app/src/jmteb/evaluators/retrieval/evaluator.py", line 118, in __call__
[rank7]:     val_results[dist_name], _ = self._compute_metrics(
[rank7]:   File "/app/src/jmteb/evaluators/retrieval/evaluator.py", line 164, in _compute_metrics
[rank7]:     similarity = dist_func(query_embeddings, doc_embeddings_chunk)
[rank7]:   File "/app/src/jmteb/evaluators/retrieval/evaluator.py", line 301, in euclidean_distance
[rank7]:     return 100 / (torch.cdist(e1, e2) + 1e-4)
[rank7]:   File "/usr/local/lib/python3.10/dist-packages/torch/functional.py", line 1335, in cdist
[rank7]:     return _VF.cdist(x1, x2, p, None)  # type: ignore[attr-defined]
[rank7]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 15.27 GiB. GPU ^G has a total capacity of 79.15 GiB of which 14.70 GiB is free. Including non-PyTorch memory, this process has 64.46 GiB memory in use. Of the allocated memory 31.60 GiB is allocated by PyTorch, and 30.52 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)**

sbintuitions / JMTEB

[bug] retrieverタスクでmulti-GPU使用時にCUDA OOMが発生する #50