nod-ai / shark-ai

SHARK Inference Modeling and Serving
Apache License 2.0
12 stars 25 forks source link

Shark V1 Nov 2024 Release Testing Bash #512

Open pdhirajkumarprasad opened 3 days ago

pdhirajkumarprasad commented 3 days ago

Please login into a MI300X machine. For AMD Shark Team, see internal slack channel on available machines.

Feel free to test it however you like but here are some guidelines you could follow.

Testing guidelines

Multiple people may try the same feature so whoever is trying a particular feature, please put your under "Testers" column in the tables below.

shortfin_apps.sd.server with different options:

Flags options Testers Issues
--host HOST
--port PORT
--root-path ROOT_PATH
--timeout-keep-alive
--device local-task,hip,amdgpu
--target gfx942,gfx1100 https://github.com/nod-ai/SHARK-Platform/issues/515
--device_ids
--tokenizers
--model_config
--workers_per_device
--fibers_per_device
--isolation per_fiber, per_call, none
--show_progress
--trace_execution
--amdgpu_async_allocations
--splat
--build_preference compile,precompiled
--compile_flags
--flagfile FLAGFILE https://github.com/nod-ai/SHARK-Platform/issues/515
--artifacts_dir ARTIFACTS_DIR

shortfin_apps.sd.simple_client with different options:

Flags Testers Issues
--file
--reps
--save
--outputdir
--steps
--interactive

other issues

Issue description issue no
dan-garvey commented 3 days ago

not a critique, just something I noticed, takes about 12 min for server startup on a cirrascale 8x mi300 machine

IanNod commented 3 days ago

I had same server startup time. I attributed it to downloading models/weights on setup.

Minor critique it does not look like we are changing the random latents generated. Not sure where that is controlled but was seeing the same image generated given the same prompt

dan-garvey commented 3 days ago

yeah as Ian said the seed appears fixed, I think when reps>1 it should be changed.

maybe this works?


                async for i in async_range(args.reps):
                data["seed"] = [i]
                pending.append(
                    asyncio.create_task(send_request(session, i, args, data))
                )
                await asyncio.sleep(
                    1
                )  # Wait for 1 second before sending the next request```
dan-garvey commented 3 days ago

well at least in the args.reps>1 case

archana-ramalingam commented 3 days ago

At cold start, incomplete model download causes the following issue. Deleting cached models and re-downloading them fixed it.

INFO:root:Loading parameter fiber 'model' from: /home/aramalin/.cache/shark/genfiles/sdxl/stable_diffusion_xl_base_1_0_punet_dataset_i8.irpa Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/home/aramalin/SHARK-Platform/3.12.venv/lib/python3.12/site-packages/shortfin_apps/sd/server.py", line 388, in main( File "/home/aramalin/SHARK-Platform/3.12.venv/lib/python3.12/site-packages/shortfin_apps/sd/server.py", line 376, in main sysman = configure(args) ^^^^^^^^^^^^^^^ File "/home/aramalin/SHARK-Platform/3.12.venv/lib/python3.12/site-packages/shortfin_apps/sd/server.py", line 115, in configure sm.load_inference_parameters(*datasets, parameter_scope="model", component=key) File "/home/aramalin/SHARK-Platform/3.12.venv/lib/python3.12/site-packages/shortfin_apps/sd/components/service.py", line 116, in load_inference_parameters p.load(path, format=format) ValueError: shortfin_iree-src/runtime/src/iree/io/formats/irpa/irpa_parser.c:16: OUT_OF_RANGE; file segment out of range (1766080 to 2614665369 for 2612899290, file_size=726679552); verifying storage segment

pdhirajkumarprasad commented 3 days ago

I have tried almost all flags and different stuff for client/server and added my observation here https://github.com/pdhirajkumarprasad/for_sharing_logs/blob/main/Shark-V1(Nov,%202024)-Bash.md