paperswithcode / galai

Model API for GALACTICA
Apache License 2.0
2.67k stars 275 forks source link

Getting ZeroDivisionError when using `galai` module #44

Closed phineas-pta closed 1 year ago

phineas-pta commented 1 year ago

I keep getting the ZeroDivisionError with the galai module

import galai
model = galai.load_model("mini", num_gpus = 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "███/myconda/condaGA/lib/python3.7/site-packages/galai/__init__.py", line 39, in load_model
    model._load_checkpoint(checkpoint_path=get_checkpoint_path(name))
  File "███/myconda/condaGA/lib/python3.7/site-packages/galai/model.py", line 69, in _load_checkpoint
    offload_state_dict=True
  File "███/myconda/condaGA/lib/python3.7/site-packages/accelerate/big_modeling.py", line 358, in load_checkpoint_and_dispatch
    low_zero=(device_map == "balanced_low_0"),
  File "███/myconda/condaGA/lib/python3.7/site-packages/accelerate/utils/modeling.py", line 370, in get_balanced_memory
    per_gpu = module_sizes[""] // (num_devices - 1 if low_zero else num_devices)
ZeroDivisionError: integer division or modulo by zero

But if I use the transformers module it works perfectly on GPU

mkardas commented 1 year ago

Hi @phineas-pta, can you check with galai version 1.1.0?

phineas-pta commented 1 year ago

thank you, the new versions seems to work nicely but i'm still sticking to transformers