hitz-zentroa / GoLLIE

Guideline following Large Language Model for Information Extraction
https://hitz-zentroa.github.io/GoLLIE/
Apache License 2.0
263 stars 18 forks source link

OOM #3

Closed jmanhype closed 9 months ago

jmanhype commented 9 months ago

checkpoint keep getting killed. seems like it neeeds 33 gb of memory and its being loaded by fp32. help

jmanhype commented 9 months ago

im doing low cpu mem its loading 66 percent and then getting killed now sure if i need to shard or what?

ikergarcia1996 commented 9 months ago

Hi @jmanhype!

Can you provide more information? Which configuration file and model are you attempting to run?

jmanhype commented 9 months ago

Im loading the pre trained model 7b

jmanhype commented 9 months ago

Using the baseline 7b config file

osainz59 commented 9 months ago

Hi @jmanhype !

The baseline in this repository refers to the baseline with which we compare in the paper. If you want to try the 7B model you should use this config: configs/model_configs/eval/CoLLIE-7B_CodeLLaMA.yaml

ikergarcia1996 commented 9 months ago

@jmanhype How much RAM does your machine have? Are you using the latest HuggingFace transformers version? Are you using our load_model function or the AutoModelForCausalLM.from_pretrained function?

We uploaded the weights in float32 to ensure compatibility with devices that do not support bfloat16 (e.g., V100 GPUs). I will attempt to replicate the issue on a 32GB RAM machine and get back to you in a few hours. If necessary, we will upload the bf16 weights.

jmanhype commented 9 months ago

Device name DESKTOP-06OLKHE Processor Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz 3.60 GHz Installed RAM 32.0 GB (31.9 GB usable) Device ID 9B8CB989-464F-4937-B0E5-762B06509499 Product ID 00325-81789-21676-AAOEM System type 64-bit operating system, x64-based processor Pen and touch No pen or touch input is available for this display

what if you did that and uploaded sharded version or would that not fix it ?

side note i went from it getting killed straight away to it loadig 2/3 checkpoints then getting killed using the -low-cpu-memory method

jmanhype commented 9 months ago

I have a gtx 2080 super 8gb

jmanhype commented 9 months ago

Lmk what you think

ikergarcia1996 commented 9 months ago

@jmanhype We have updated the HiTZ/GoLLIE-7B weights to bfloat16: https://huggingface.co/HiTZ/GoLLIE-7B.

The larger models will be updated throughout the day. Since they were trained in bfloat16, it's logical to host them in that precision on the hub. I tested the 'Create Custom Task' notebook on a 32GB machine, and everything works as expected. Can you try again?