How to use model archiver utility for complex projects?

sagjounkani commented 4 years ago

I am trying to serve the model from the NCRFpp project using torchserve. The files that are required by my custom handler.py file are in multiple folders and have import statements which refer to this folder hierarchy. The model archiver zips all these files and extracts into a temporary folder at the same level without this folder hierarchy, leading to runtime errors because the import statements fail. How do I ensure that the import statements work, with same folder hierarchy as used during development, while using torchserve?

harshbafna commented 4 years ago

@sagjounkani: You could create a zip file of the dependency python files in the required folder hierarchy and supply this zip file using the --extra-files parameter while creating the mar. Later while initializing the handler you could extract this zip file, in the model's temporary directory which is already added in the PYTHONPATH

You can refer to the waveglow text-to-speech-synthesizer example

misrasaurabh1 commented 4 years ago

Yep, I encountered this problem as well. Even though the files specified by --extra-files even though may belong to a directory structure, torchserve copies them all to a single top level directory while serving. This complicates things for applications. Also there is a big possibility of filename collisions. I would like the directory structure of the files within the --extra-files to be preserved. Also, if the current behavior is what is expected it should be made clear in the documentation.

sagjounkani commented 4 years ago

@harshbafna Thank you for the resolution, I was able to deploy the model. I found it easier to append the path for dependencies in my custom handler file compared to the process followed in the waveglow text-to-speech-synthesizer example. Not sure of a standard way to approach this. Agree with @misrasaurabh1 that if somehow the directory structure is preserved within the --extra-files it will make things easier.

harshbafna commented 4 years ago

@sagjounkani: Your solution may work when TorchServe is deployed on your localhost, however, it will fail in case you need to register the model on a remote host where the extra files will not be available on the server's file system.

There are multiple ways you can add you dependency python files in the model-archive :

zip your directory structure as required and unzip in the handler (as explained earlier)
You could also create an egg file for all the dependency python packages and add that in the model-archive(.mar).
if your model is dependent on a third-party python package, you could also supply a requirements.txt file to include a list of all the python modules required in the mar file. Refer documentataion for more details. Note that, this feature is not available in the current GA release but is available in the latest master. It will be a part of the upcoming release.
In case the project is not available on PyPi repo, you can manually create the build for that project and supply the generated .tar.gz or .zip file in the and include it in the requirements.txt.

@misrasaurabh1:

Also, if the current behavior is what is expected it should be made clear in the documentation.

We will take this up as a part of #561 . Also, if you think the above-provided options are too complicated please feel free to raise a feature request.

hatzel commented 3 years ago

While I think this is certainly an area where torchserve should support a simpler approach, let me share my work around with you.

The implementation here just recursively copies any sub directories (of the included directories) over. You can actually trick this into doing what you want by creating a temporary directory into which you symlink the directories you originally wanted to include.

TEMP_DIR=$(mktemp -d)
ln -s "$(pwd)/dir_a" $TEMP_DIR
ln -s "$(pwd)/dir_b" $TEMP_DIR

# call torch-model-archiver with `--extra-files $TEMP_DIR` here

rm -rf $TEMP_DIR

The resulting archive will include the top level directories named dir_a and dir_b.

mhashas commented 1 year ago

@harshbafna

How does it work exactly with the egg file / wheel file? You still need to sys.path.append("egg_file_location") right? And you can only get the location from the context model_dir + the name you gave it, correct?

My use case is as follows. Let's assume I implement my own basehandler in the shared module. Now I work on my_project, and create a my_project_handler. However, I want my baseclass to be BaseHandler from shared. But shared does not exist in the torchserve/mar environment until my handler either

unzips the code
adds the egg file location

Is that correct? Is there any solution for this?

harshbafna commented 1 year ago

@mhashas :

@harshbafna

How does it work exactly with the egg file / wheel file? You still need to sys.path.append("egg_file_location") right? And you can only get the location from the context model_dir + the name you gave it, correct?

My use case is as follows. Let's assume I implement my own basehandler in the shared module. Now I work on my_project, and create a my_project_handler. However, I want my baseclass to be BaseHandler from shared. But shared does not exist in the torchserve/mar environment until my handler either

unzips the code

adds the egg file location

Is that correct? Is there any solution for this?

The model's temporary directory, where the model archive (.mar) is extracted, is already added in the PYTHONPATH.

In your case, you can add the zip file in your model archive using --extra-files flag and then add step in your custom handler to unzip/extract the zip file in the model's temporary directory.

You can refer to the waveglow text-to-speech-synthesizer example

waveglow mar creation script
- We are adding tacotron.zip to the model archive in the above step
waveglow handler
- The tacotron.zip file is extracted in the model's temp directory.

mhashas commented 1 year ago

@harshbafna Yes, and a drawback of your approach is that all imports need to be done locally in functions, after the unzipping was done: https://github.com/pytorch/serve/blob/master/examples/text_to_speech_synthesizer/waveglow_handler.py#L40. If you move this import at the top-level, it would fail, because the source files don't exist yet. For my use-case, I cannot add the shared handler as a base class to the handler, because it only exists after the handler was initialized and the zip unzipped.

I think in the end @hatzel's response is the easiest one to implement and does solve my problem.

harshbafna commented 1 year ago

@harshbafna Yes, and a drawback of your approach is that all imports need to be done locally in functions, after the unzipping was done: https://github.com/pytorch/serve/blob/master/examples/text_to_speech_synthesizer/waveglow_handler.py#L40. If you move this import at the top-level, it would fail, because the source files don't exist yet. For my use-case, I cannot add the shared handler as a base class to the handler, because it only exists after the handler was initialized and the zip unzipped.

I think in the end @hatzel's response is the easiest one to implement and does solve my problem.

@mhashas Yes, that is one way to workaround the problem.

Other way can be to package the wheel file and a custom requirements.txt file in your model archive. TorchServe will automatically install the package.

pytorch / serve

How to use model archiver utility for complex projects? #566