Closed B0-B closed 1 year ago
I'm seeing the same issue.
I can't say for certain, but I have a feeling this happens when CPU ram is low.
I see the "Loading checkpoint shards" message hang when I try to load anything beyond facebook_opt-2.7b
.
I also only have 16GB of CPU Ram.
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.
I'm having the same problem.
I also have the same problem.
I also have the same problem.
+1
Me too
I also have this problem. When I look at my CPU, RAM, and Memory usage, it is maxxed out until it gives up and then the command prompt prompts me to press any key, at which point, it closes.
I hope this helps someone. I think elbowdonkey is onto something with the RAM thoughts, I run Oobabooga in a Hyper-V VM. In the VM memory setting if I have Dynamic Memory enabled loading checkpoint shards fails at 0%. When I have Dynamic Memory disabled the models load fine.
@ebaker-github How to disable dynamic memory when loading the model?
@ebaker-github How to disable dynamic memory when loading the model?
If you are asking how to disable dynamic memory in Hyper-V:
1, Select and right-click the VM in Hyper-V Manager Virtual Machines. 2, Navigate to the Memory section. 3, Uncheck the box labeled "Enable Dynamic Memory."
That disables and greys out the Dynamic Memory section. Also note that I am running completely in CPU RAM: I do not have a GPU.
Adding a swap file worked for my error, although the memory usage between RAM and swap file was larger than I had anticipated. In my case I REALLY didn't have sufficient RAM + swap when we got this failure.
I also have same problem
(env) ubuntu@raghava:~/$ python app.py Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Killed
I also have same problem
(env) ubuntu@raghava:~/$ python app.py Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Killed
After a bit of time it definitely seems to be caused by not enough RAM.
Same issue. How can i fix it?
Describe the bug
Description
I installed with one click installer on Ubuntu 20.04 and I called the conda environment with
After I start the server with
checkpoints loading gets killed at 0%
I have tested with GPU and CPU-only install, in both cases the process gets killed. Not sure if it is a memory problem, as there is no output.
Is there an existing issue for this?
Reproduction
Download the model (should take ~12GB of space)
Screenshot
No response
Logs
System Info