predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
https://loraexchange.ai
Apache License 2.0
2.08k stars 138 forks source link

Using Source = Local for Base Model #347

Open silveranalytics opened 5 months ago

silveranalytics commented 5 months ago

Feature request

I only see source=local available for the adapters, is this the case?

Even with the models cached/pointing to it locally, there is still a callout to HF without changing the source from 'hub'.

Motivation

My ultimate goal is to run offline.

docker run -e RUST_BACKTRACE=full --gpus '"device=3"' --network none --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/predibase/lorax:latest --model-id /data/models--mistralai--Mistral-7B-Instruct-v0.1/

Your contribution

I've tried to add the option in get_model but I am a novice.

def get_model(
    model_id: str,
    adapter_id: str,
    revision: Optional[str],
    sharded: bool,
    quantize: Optional[str],
    compile: bool,
    dtype: Optional[str],
    trust_remote_code: bool,
    source: str,
    adapter_source: str,
) -> Model:
    config_dict = None
    if source == "s3":
        # change the model id to be the local path to the folder so
        # we can load the config_dict locally
        logger.info(f"Using the local files since we are coming from s3")
        model_path = get_s3_model_local_dir(model_id)
        logger.info(f"model_path: {model_path}")
        config_dict, _ = PretrainedConfig.get_config_dict(
            model_path, revision=revision, trust_remote_code=trust_remote_code
        )
        logger.info(f"config_dict: {config_dict}")
        model_id = str(model_path)
    elif source == "hub":
        config_dict, _ = PretrainedConfig.get_config_dict(
            model_id, revision=revision, trust_remote_code=trust_remote_code
        )
    elif source == "local":
        model_path = get_model_local_dir(model_id)
        logger.info(f"Using the local files since we are specified as local source")
        logger.info(f"model_path: {model_path}")
        config_dict, _ = PretrainedConfig.get_config_dict(
            model_path, revision=revision, trust_remote_code=trust_remote_code
        )
        logger.info(f"config_dict: {config_dict}")
    else: 
        raise ValueError(f"Unknown source {source}")

Even with adding the "local", I still get an error:

`  File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 282, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/__init__.py", line 125, in get_model
    raise ValueError(f"Unknown source {source}")`
tgaddair commented 5 months ago

Hey @silveranalytics, thanks for raising this issue. We definitely want to make it easier for folks to use LoRAX without needing to go to HF. Happy to take some time to investigate this.

jonseaberg commented 5 months ago

@silveranalytics I ran into a similar error. I added a "local" condition to get_model using the same body as "hub".

  elif source == "hub":
        config_dict, _ = PretrainedConfig.get_config_dict(
            model_id, revision=revision, trust_remote_code=trust_remote_code
        )
  elif source == "local":
        config_dict, _ = PretrainedConfig.get_config_dict(
            model_id, revision=revision, trust_remote_code=trust_remote_code
        )
  else:

After the change I was able to specify --source local and my local model was loaded. @magdyksaleh I am happy to put up a PR if this looks like its in the right direction.

amybachir commented 5 months ago

This is a must have for us. We need the ability to self-host. We cannot go out to the internet to get the base model weights. We need to load them from disk

ruhomor commented 5 months ago

Having an issue with local models as well :(

fornitroll commented 4 months ago

As I understand, this fix is still in work now. Could you advise when it will be ready? Also now the Mistral model could be used from HF only with token (after acceptance of mistralai conditions). So it is reasonable to add a possibility to run model with token from docker container (it will be needed also to run a private models).

jonseaberg commented 4 months ago

We were able to get the base model loaded from a local source using hub as the source with no code changes. Using the command line from the original post:

docker run -e RUST_BACKTRACE=full --gpus '"device=3"' --network none --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/predibase/lorax:latest --source hub --model-id /data/models--mistralai--Mistral-7B-Instruct-v0.1/