huggingface / optimum-nvidia

Apache License 2.0
867 stars 86 forks source link

Issue with llama.py Script in Docker - Process Stalling at Iteration 512 #49

Closed H04K closed 8 months ago

H04K commented 8 months ago

Hi, I'm trying to quantize llama2-7b to FP8 using the examples/text-generation/llama.py script inside the docker . The process appears to stall at iteration 512. Here are the details of the current environment:

Hardware: H100 GPU Memory Usage: Approximately 32 gigabytes RAM/CPU Performance: Stable

image image image It's been stuck on this for about 35 minutes and have attempted multiple restarts, but the issue persists.

Any idea why, or does this step usually take this long?

H04K commented 8 months ago

It was my mistake; using S3 directly as args for the model output directory, particularly during calibration, resulted in excessively long access times, causing everything to stall. I've resolved this by placing a folder in the Docker storage, which I then send to S3. This adjustment has eliminated the errors. Closing this issue now

Thanks for the work you're doing keep it up