Support Phi-3.5 MoE - Githubissues

Feature request

Add support for microsoft/Phi-3.5-MoE-instruct which has PhiMoEForCausalLM arch.

Motivation

It fails with the following error:

2024-08-25 21:25:51.891 | INFO     | text_generation_server.utils.import_utils:<module>:75 - Detected system cuda
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 118, in serve
    server.serve(

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 297, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 1064, in get_model
    raise NotImplementedError("sharded is not supported for AutoModel")

NotImplementedError: sharded is not supported for AutoModel
 rank=3
2024-08-25T21:25:56.550031Z ERROR text_generation_launcher: Shard 3 failed to start
2024-08-25T21:25:56.550058Z  INFO text_generation_launcher: Shutting down shards

Your contribution

I can test any PR

huggingface / text-generation-inference

Support Phi-3.5 MoE #2457

Feature request

Motivation

Your contribution