pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.22k stars 862 forks source link

Missing mandatory parameter --model-store #2016

Open david-waterworth opened 1 year ago

david-waterworth commented 1 year ago

šŸ“š The doc issue

I created a config.properties file

model_store="model_store"
load_models=all
models = {\
    "tc": {\
      "1.0.0": {\
        "defaultVersion": true,\
        "marName": "text_classifier.mar",\
        "minWorkers": 1,\
        "maxWorkers": 4,\
        "batchSize": 1,\
        "maxBatchDelay": 100,\
        "responseTimeout": 120\
      }\
    }\
  }

The documentation for torchserve states:

Customize TorchServe behaviour by using the following command line arguments when you call torchserve:

--model-store Overrides the model_store property in config.properties file
--models Overrides the load_models property in config.properties

This wording implies to me that --model-store is optional, but running torchserve --start (from a folder containing config.properties) results in the error Missing mandatory parameter --model-store

It seems to me there should only be an error if the model-store location cannot be inferred at all, i.e. it's not passed via --model-store or defined in config.properties (it's not clear how --model-store can 'override' the value in config.properties if it's mandatory)

Suggest a potential alternative/fix

No response

david-waterworth commented 1 year ago

Also I don't think it's actually loading the config.property unless I explicitly specify it via --ts-config config.properties

https://github.com/pytorch/serve/blob/master/docs/configuration.md#configproperties-file

Implies it should, since printenv TS_CONFIG_FILE returns nothing and --ts-config is not passed so it should default to option 3. When I explicitly pass --ts-config config.properties I can see that it's being used, it's in the same directory and I think it's spelt correctly)

lxning commented 1 year ago

@david-waterworth Here is the response for your questions.

david-waterworth commented 1 year ago

Thanks, I think I've figured out my confusion on the first bullet point. It looks like there's an additional step - if you don't pass --ts-config then it seems to load the config from the last shutdown:

$torchserve --start --model-store model_store Config file: logs/config/20221202075258030-shutdown.cfg

$torchserve --start --model-store model_store --ts-config config.properties Config file: config.properties

So there seems to be another step - ie. if there's a cfg from a pervious shutdown use that in preference to all the steps from doc?

It's the second point you say model_store is mandatory and must be defined config.property or overridden by torchserve command line. The or to me implies that it's not mandatory to pass to the command line if it's in the config.properties but that's clearly not the case - its mandatory to pass to the command line (it's explicit in the help)

Even when I pass it to the command line, and put a different path in the config.properties it always seems to use the command line version so I don't see how to override?

lxning commented 1 year ago

@david-waterworth here is the response.

  1. Q: "torchserve --start --model-store model_store --ts-config config.properties" See snapshot-related-issues. In this command, snapshot will take effective.

  2. command line option --model-store overrides model_store defined in config.property. TorchServe requires model store path. It must be provided either in "model_store" in config.property or "--model-store" in TorchServe command line option. "model_store" can be overridden by "--model-store" in TorchServe command line option. The following test shows model_store(ie. /Users/XXX/workplace/python_env/serve/model_store) defined in config.property is overridden by --model-store (ie. Model Store: /Volumes/workplace/python_env/serve/model_store1) defined in cmd line.

    
    cat config.properties
    inference_address=http://0.0.0.0:8080
    management_address=http://0.0.0.0:8081

number_of_netty_threads=32 job_queue_size=1000

vmargs=-Xmx4g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError prefer_direct_buffer=True

default_response_timeout=300 unregister_model_timeout=300 install_py_dep_per_model=true default_workers_per_model=2 model_store=/Users/XXX/workplace/python_env/serve/model_store

load_models=all

torchserve --ncs --start --ts-config config.properties --model-store model_store1 --models all Warning: TorchServe is using non-default JVM parameters: -Xmx4g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. 2022-12-01T15:05:53,481 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager... 2022-12-01T15:05:53,910 [INFO ] main org.pytorch.serve.ModelServer - Torchserve version: 0.5.3 TS Home: /Users/XXX/opt/anaconda3/lib/python3.8/site-packages Current directory: /Volumes/workplace/python_env/serve Temp directory: /var/folders/w6/s5gp9htn2pb9z87lwp6fzjg9hv4nys/T/ Number of GPUs: 0 Number of CPUs: 12 Max heap size: 4096 M Python executable: /Users/XXX/opt/anaconda3/bin/python Config file: /Users/XXX/workplace/python_env/serve/config.properties Inference address: http://0.0.0.0:8080 Management address: http://0.0.0.0:8081 Metrics address: http://127.0.0.1:8082 Model Store: /Volumes/workplace/python_env/serve/model_store1 Initial Models: all Log dir: /Volumes/workplace/python_env/serve/logs Metrics dir: /Volumes/workplace/python_env/serve/logs Netty threads: 32 Netty client threads: 0 Default workers per model: 2 Blacklist Regex: N/A Maximum Response Size: 6553500 Maximum Request Size: 6553500 Limit Maximum Image Pixels: true Prefer direct buffer: True Allowed Urls: [file://.|http(s)?://.] Custom python dependency for model allowed: true Metrics report format: prometheus Enable metrics API: true Workflow Store: /Volumes/workplace/python_env/serve/model_store1 Model config: N/A 2022-12-01T15:05:53,923 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin... 2022-12-01T15:05:53,947 [DEBUG] main org.pytorch.serve.ModelServer - Loading models from model store: mnist_scripted_v2.mar 2022-12-01T15:05:54,126 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 2.0 for model mnist_scripted_v2 2022-12-01T15:05:54,126 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 2.0 for model mnist_scripted_v2 2022-12-01T15:05:54,127 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model mnist_scripted_v2 loaded. 2022-12-01T15:05:54,127 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: mnist_scripted_v2, count: 2

david-waterworth commented 1 year ago

Thanks, I did see an example containing --ncs but didn't understand its purpose.

I'm Still confused by the use of overides - as far as I can see the value of model_store= in the config.property can never be used? In your example its using model_store1 which is what you specified via the cmd line option --model-store

Since you must specify that command line option can you provide an example where it actually uses the value from config.property?

In particular you say It must be provided either in "model_store" in config.property or "--model-store" in TorchServe command line option - but the --model-store argument is not optional

lxning commented 1 year ago

@david-waterworth Here is the example of using model_store defined in config.property. "--model-store" is optional if "model_store" is defined in config.property which can be identified by torchserve.


cat config.properties
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081

number_of_netty_threads=32
job_queue_size=1000

vmargs=-Xmx4g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError
prefer_direct_buffer=True

default_response_timeout=300
unregister_model_timeout=300
install_py_dep_per_model=true
default_workers_per_model=2
model_store=/Users/XXX/workplace/python_env/serve/model_store
#load_models=all

torchserve --ncs --start --ts-config config.properties --models all
Warning: TorchServe is using non-default JVM parameters: -Xmx4g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2022-12-01T17:13:48,309 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2022-12-01T17:13:48,482 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.5.3
TS Home: /Users/XXX/opt/anaconda3/lib/python3.8/site-packages
Current directory: /Volumes/workplace/python_env/serve
Temp directory: /var/folders/w6/s5gp9htn2pb9z87lwp6fzjg9hv4nys/T/
Number of GPUs: 0
Number of CPUs: 12
Max heap size: 4096 M
Python executable: /Users/XXX/opt/anaconda3/bin/python
Config file: /Users/XXX/workplace/python_env/serve/config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Metrics address: http://127.0.0.1:8082
Model Store: /Volumes/workplace/python_env/serve/model_store
Initial Models: all
Log dir: /Volumes/workplace/python_env/serve/logs
Metrics dir: /Volumes/workplace/python_env/serve/logs
Netty threads: 32
Netty client threads: 0
Default workers per model: 2
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: True
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: true
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /Volumes/workplace/python_env/serve/model_store
Model config: N/A
2022-12-01T17:13:48,495 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2022-12-01T17:13:48,516 [DEBUG] main org.pytorch.serve.ModelServer - Loading models from model store: mnist_scripted_v2.mar
2022-12-01T17:13:48,682 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 2.0 for model mnist_scripted_v2
2022-12-01T17:13:48,682 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 2.0 for model mnist_scripted_v2
2022-12-01T17:13:48,682 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model mnist_scripted_v2 loaded.
2022-12-01T17:13:48,683 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: mnist_scripted_v2, count: 2
2022-12-01T17:13:48,693 [DEBUG] W-9000-mnist_scripted_v2_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/Users/XXX/opt/anaconda3/bin/python, /Users/XXX/opt/anaconda3/lib/python3.8/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /var/folders/w6/s5gp9htn2pb9z87lwp6fzjg9hv4nys/T//.ts.sock.9000]
2022-12-01T17:13:48,693 [DEBUG] W-9001-mnist_scripted_v2_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/Users/XXX/opt/anaconda3/bin/python, /Users/XXX/opt/anaconda3/lib/python3.8/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /var/folders/w6/s5gp9htn2pb9z87lwp6fzjg9hv4nys/T//.ts.sock.9001]
2022-12-01T17:13:48,695 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: KQueueServerSocketChannel.
2022-12-01T17:13:48,768 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2022-12-01T17:13:48,769 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: KQueueServerSocketChannel.
2022-12-01T17:13:48,770 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://0.0.0.0:8081
david-waterworth commented 1 year ago

Ahh I see, looks like for that to work you also need to use --ts-config config.properties

if you use torchserve --ncs --start it complains Missing mandatory parameter --model-store

lxning commented 1 year ago

@david-waterworth "torchserve --ncs --start" can also work if env TS_CONFIG_FILE is defined.

echo $TS_CONFIG_FILE
/Users/XXX/workplace/python_env/serve/config.properties

torchserve --ncs --start
Warning: TorchServe is using non-default JVM parameters: -Xmx4g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2022-12-01T17:39:06,229 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2022-12-01T17:39:06,682 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.5.3
TS Home: /Users/XXX/opt/anaconda3/lib/python3.8/site-packages
Current directory: /Volumes/workplace/python_env/serve
Temp directory: /var/folders/w6/s5gp9htn2pb9z87lwp6fzjg9hv4nys/T/
Number of GPUs: 0
Number of CPUs: 12
Max heap size: 4096 M
Python executable: /Users/XXX/opt/anaconda3/bin/python
Config file: /Users/XXX/workplace/python_env/serve/config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Metrics address: http://127.0.0.1:8082
Model Store: /Volumes/workplace/python_env/serve/model_store
Initial Models: N/A
Log dir: /Volumes/workplace/python_env/serve/logs
Metrics dir: /Volumes/workplace/python_env/serve/logs
Netty threads: 32
Netty client threads: 0
Default workers per model: 2
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: True
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: true
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /Volumes/workplace/python_env/serve/model_store
Model config: N/A
2022-12-01T17:39:06,694 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2022-12-01T17:39:06,721 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: KQueueServerSocketChannel.
2022-12-01T17:39:06,795 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2022-12-01T17:39:06,795 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: KQueueServerSocketChannel.
2022-12-01T17:39:06,796 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://0.0.0.0:8081
2022-12-01T17:39:06,796 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: KQueueServerSocketChannel.
2022-12-01T17:39:06,797 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.