zhaoxlpku / HKU-DASC7606-A2

18 stars 11 forks source link

GPU run out of Storage #5

Open lunachern opened 6 months ago

lunachern commented 6 months ago

When downloading the model "bigcode/starcoder" and embedder "bert-nli-mean-tokens", although I have deleted the HuggingFace folder and all things of assignment 1 from the device, the disk still run out of storage. Is there any download else I didn't find which takes a lot space?

tengwang0318 commented 6 months ago

startcoder is so huge that the normal machine couldn't afford it. Try some light-weight model.

lunachern commented 6 months ago

startcoder is so huge that the normal machine couldn't afford it. Try some light-weight model.

First thanks a lot! I use it because it is the advised one in the given code.

However now I'm in a much bigger trouble. When I tried to delete sth for more spaces, I saw some unknown folders through FIleZilla. At that time I thought I downloaded sth mistakenly but now I release they are other students' folder on cs2 gpu. I tried to delete everything unknown but turns out that it deleted sth about my settings and anaconda3.

Now I cannot use conda or even reinstall it.

I don't know what to do. So despairing. How I hope I can restart everything!

There are the error message. concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. [547266] Failed to execute script 'entry_point' due to unhandled exception!

tengwang0318 commented 6 months ago

trust me. I remember that bigcoder reqiures almost 30 GB for RAM(HBM) when you use float 16 version. Even if you download it and config it successfully, you couldn't run it in the HKU GPU FARM.

What's more, when you delete someone's files, you will face permission denied error. Don't worry about it. Try to exit and config the nevironment again. Maybe it works.

lunachern commented 6 months ago

trust me. I remember that bigcoder reqiures almost 30 GB for RAM(HBM) when you use float 16 version. Even if you download it and config it successfully, you couldn't run it in the HKU GPU FARM.

What's more, when you delete someone's files, you will face permission denied error. Don't worry about it. Try to exit and config the nevironment again. Maybe it works.

Thanks. However, I failed to delete others while I seems to successfully delete mine! T^T

So I cannot use conda now. And I cannot reinstall it because of "unhandled exception".

tengwang0318 commented 6 months ago

I got this error. It's from run out of memory space. Maybe you didn't delete your files successfully. Source:link

lunachern commented 6 months ago

I got this error. It's from run out of memory space. Maybe you didn't delete your files successfully. Source:link

So, can I fix it by deleting sth? Can I delete the previous Ananconda folder, which seems so big?

Actually it has already been my third time to install it so already two Anaconda files. And it is so big that it has been deleting a long time but still there. And I'm so afraid that I might distroyed something. T.T

lunachern commented 6 months ago

I got this error. It's from run out of memory space. Maybe you didn't delete your files successfully. Source:link

I tried to use conda but failed. Message as below. Could you please help to have a look? Thanks!

Do you accept the license terms? [yes|no] [no] >>> yes

Anaconda3 will now be installed into this location: /userhome/cs2/mchenal/anaconda3

[/userhome/cs2/mchenal/anaconda3] >>> PREFIX=/userhome/cs2/mchenal/anaconda3 mchenal@gpu2-comp-111:~$ conda create -n nlp_env python=3.10.9 conda: command not found

tengwang0318 commented 6 months ago

try to remove your files under /userhome/cs2/your_name folder, by using rm -rf. Don't worry about it, you won't destory anything, due to permission. 误删的话,运维背锅XD

Search how to install and config conda in linux.

lunachern commented 6 months ago

try to remove your files under /userhome/cs2/your_name folder, by using rm -rf. Don't worry about it, you won't destory anything, due to permission. 误删的话,运维背锅XD

Search how to install and config conda in linux.

mchenal@gpu2-comp-111:~$ conda create -n nlp_env python=3.10.9 conda: command not found mchenal@gpu2-comp-111:~$ pip install torch==2.0.1 Command 'pip' not found, but can be installed with: apt install python3-pip Please ask your administrator. mchenal@gpu2-comp-111:~$ apt install python3-pip E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied) E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?

Not only conda, even pip is missing.... crying face

tengwang0318 commented 6 months ago

try bash Miniconda3-latest-Linux-x86_64.sh

you don't have the permission to use apt in the HKU GPU server.

lunachern commented 6 months ago

try bash Miniconda3-latest-Linux-x86_64.sh

you don't have the permission to use apt in the HKU GPU server.

Trying. Conda is back now! Still installing packages. Hope everything goes well. Millions of thanks!!!!

tengwang0318 commented 6 months ago

miniconda is mini version of conda, whereas ananconda installs more packages than miniconda has. No difference.

lunachern commented 6 months ago

miniconda is mini version of conda, whereas ananconda installs more packages than miniconda has. No difference.

Solved. THANK YOU! 大神太感谢了