nod-ai / shark-ai

SHARK Inference Modeling and Serving
Apache License 2.0
12 stars 25 forks source link

[llama decomposed]: Parameter files missing for different decomposed models #472

Open pdhirajkumarprasad opened 4 days ago

pdhirajkumarprasad commented 4 days ago

In nightly, we have different decomposed models like

testBenchmark8B_fp8_Decomposed
testBenchmark70B_fp8_TP8_Decomposed
testBenchmark405B_fp8_TP8_Decomposed

which are failing as parameter files are missing.

Error:

 File "/home/sai/actions-runner-llama/_work/SHARK-Platform/SHARK-Platform/deps/iree-turbine/iree/turbine/aot/params.py", line 234, in load
E               self._index.load(
E           ValueError: Error opening parameter file: c/runtime/src/iree/base/internal/file_io.c:253: NOT_FOUND; failed to open file '/data/llama-3.1/weights/405b/f8/llama405b_fp8.irpa'
E           
E           
E           Invoked with:
E             cd /home/sai/actions-runner-llama/_work/SHARK-Platform/SHARK-Platform && python3 -m sharktank.examples.export_paged_llm_v1 --irpa-file=/data/llama-3.1/weights/405b/f8/llama405b_fp8.irpa --output-mlir=/home/sai/actions-runner-llama/_work/SHARK-Platform/SHARK-Platform/2024-11-10/llama-405b/fp8_decomposed.mlir --output-config=/home/sai/actions-runner-llama/_work/SHARK-Platform/SHARK-Platform/2024-11-10/llama-405b/fp8_decomposed.json --bs=4 --attention-kernel decomposed
dan-garvey commented 4 days ago

can you send me the machine that needs these via slack

ScottTodd commented 4 days ago

Those tests should still be changed to not depend on the contents of the runner file system. Users and developers should be able to run these tests on their own systems.

If the downloads are small enough, the test could download and cache automatically. For larger models, I'd probably fail the test (or skip with a reason) and print instructions to run some setup.

dan-garvey commented 4 days ago

@ScottTodd I agree. However, as someone who is grateful these tests were written at all, I went ahead and updated the fs for now. we can leave this issue open if you'd like to use it to track the desired modality

ScottTodd commented 4 days ago

SGTM. I'll keep speaking up whenever it breaks and needs fixing though :P. We'll need the tests decoupled from the machines our team directly controls eventually if we want users or other developers to be able to run them too. (Though for 405b that's much harder than it is for smaller models)