[Documentation] Unclear how to use other architectures

louis030195 commented 1 year ago

In your readme you list optimised arch and say

Other architectures are supported on a best effort basis using:

AutoModelForCausalLM.from_pretrained(, device_map="auto")

or

AutoModelForSeq2SeqLM.from_pretrained(, device_map="auto")

Can you explain where we have to do this? I'm trying to run baichuan-inc/baichuan-7B

model=baichuan-inc/baichuan-7B
num_shard=1
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 9090:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id $model --num-shard $num_shard
start.sh (END)

023-06-21T01:41:52.798477Z INFO text_generation_launcher: Args { model_id: "baichuan-inc/baichuan-7B", revision: None, sharded: None, num_shard: Some(1), quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: false } 2023-06-21T01:41:52.798592Z INFO text_generation_launcher: Starting download process. 2023-06-21T01:41:56.934136Z WARN download: text_generation_launcher: No safetensors weights found for model baichuan-inc/baichuan-7B at revision None. Converting PyTorch weights to safetensors. 2023-06-21T01:41:56.934202Z INFO download: text_generation_launcher: Convert /data/models--baichuan-inc--baichuan-7B/snapshots/39916f64eb892ccdc1982b0eef845b3b8fd43f6b/pytorch_model.bin to /data/models--baichuan-inc--baichuan-7B/snapshots/39916f64eb892ccdc1982b0eef845b3b8fd43f6b/model.safetensors. Error: DownloadError 2023-06-21T01:42:03.924426Z ERROR text_generation_launcher: Download encountered an error: Traceback (most recent call last): File "/opt/conda/bin/text-generation-server", line 8, in sys.exit(app()) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 151, in download_weights utils.convert_files(local_pt_files, local_st_files) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 84, in convert_files convert_file(pt_file, sf_file) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 62, in convert_file save_file(pt_state, str(sf_file), metadata={"format": "pt"}) File "/opt/conda/lib/python3.9/site-packages/safetensors/torch.py", line 232, in save_file serialize_file(_flatten(tensors), filename, metadata=metadata)
File "/opt/conda/lib/python3.9/site-packages/safetensors/torch.py", line 394, in _flatten raise RuntimeError(

RuntimeError: Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'model.layers.21.mlp.gate_proj.weight', 'model.layers. 8.mlp.up_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.11.input_layernorm.weight', 'model.layers.17.mlp.gate_proj.weight', 'mode l.layer

Narsil commented 1 year ago

Try with --auto-convert false.

This error happens when trying to convert to safetensors, but it shouldn't be required for non core models.

Narsil commented 1 year ago

This model seems to be sharing it's gate_proj, however the modeling code doesn't reflect that: https://huggingface.co/baichuan-inc/baichuan-7B/blob/main/modeling_baichuan.py Not sure if it's intentional.

louis030195 commented 1 year ago

thanks @Narsil

tried auto-convert but its not in the args?

error: unexpected argument '--auto-convert' found

Usage: text-generation-launcher <--model-id |--revision |--sharded |--num-shard |--quantize |--trust-remote-code|--max-concurrent-requests |--max-best-of |--max-stop-sequences |--max-input-length |--max-total-tokens |--max-batch-size |--waiting-served-ratio |--max-batch-total-tokens |--max-waiting-tokens |--port |--shard-uds-path |--master-addr |--master-port |--huggingface-hub-cache |--weights-cache-override |--disable-custom-kernels|--json-output|--otlp-endpoint |--cors-allow-origin |--watermark-gamma |--watermark-delta |--env>

mantrakp04 commented 1 year ago

yea, facing the same issue, the model gets converted to safe tensor and then then it messes it up, not really sure on how to figure that out

ByteEvangelist commented 1 year ago

same, when I add --auto-convert false it says the argument isn't found but when I try to run without it, it tries to convert the model to safe tensors and it returns this error

2023-07-03T19:27:56.259279Z  WARN download: text_generation_launcher: No safetensors weights found for model /data/falcon-7b-instruct at revision None. Converting PyTorch weights to safetensors.

Error: DownloadError
2023-07-03T19:29:55.441248Z ERROR text_generation_launcher: Download process was signaled to shutdown with signal 9:

TalhaUusuf commented 1 year ago

same here with falcon model

bealbrown commented 1 year ago

I also got --auto-convert not an argument. Would love to be able to use text-generation-inference on models which can't be converted to safetensors

shannonphu commented 1 year ago

Did anyone figure this out on how to use other architectures?

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

huggingface / text-generation-inference

[Documentation] Unclear how to use other architectures #481