Closed H04K closed 8 months ago
It was my mistake; using S3 directly as args for the model output directory, particularly during calibration, resulted in excessively long access times, causing everything to stall. I've resolved this by placing a folder in the Docker storage, which I then send to S3. This adjustment has eliminated the errors. Closing this issue now
Thanks for the work you're doing keep it up
Hi, I'm trying to quantize llama2-7b to FP8 using the examples/text-generation/llama.py script inside the docker . The process appears to stall at iteration 512. Here are the details of the current environment:
Hardware: H100 GPU Memory Usage: Approximately 32 gigabytes RAM/CPU Performance: Stable
It's been stuck on this for about 35 minutes and have attempted multiple restarts, but the issue persists.
Any idea why, or does this step usually take this long?