Open choshiho opened 2 years ago
I see two different issues
AttributeError: /lib64/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
So would suggest trying to run a simpler example with just some PyTorch model on GPU and seeing if it works
I also see 2022-06-19T20:37:26,893 [INFO ] W-9001-my_tc_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'ts.torch_handler.Transformer_handler_generalized'
which is expected at examples/Huggingface_Transformers/Transformer_handler_generalized.py
but is not ideal and I agree that we should make our handlers easier to import.
TODO for us: Make all handlers in examples
possible to import like so import ts.torch_handler.Transformer
Does this section of the docs need to be updated to reflect that you can't import from ts
in the pytorch docker images?
https://pytorch.org/serve/custom_service.html
I'm trying to use a custom handler like so and running into this error
torch-model-archiver --model-name mdv5 --version 1.0.0 --serialized-file ../models/megadetectorv5/md_v5a.0.0.torchscript --extra-files index_to_name.json --handler ../api/megadetectorv5/mdv5_handler.py
I see two different issues
Unsuccessful NVIDIA driver installation
AttributeError: /lib64/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
So would suggest trying to run a simpler example with just some PyTorch model on GPU and seeing if it works
Importing a function that does not exist
I also see
2022-06-19T20:37:26,893 [INFO ] W-9001-my_tc_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'ts.torch_handler.Transformer_handler_generalized'
which is expected atexamples/Huggingface_Transformers/Transformer_handler_generalized.py
but is not ideal and I agree that we should make our handlers easier to import.TODO for us: Make all handlers in
examples
possible to import like soimport ts.torch_handler.Transformer
@msaroufim Thank you for you reply!
BaseHandler
from ts.torch_handler.base_handler import BaseHandler
is in the official Transformer_handler_generalized.pyhttps://github.com/pytorch/serve/blob/master/examples/Huggingface_Transformers/Transformer_handler_generalized.py
ImportError I have made the official demo Sequence Classification run successfully. The reason may be ImportError: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found I solved this ImportError by
conda install libgcc
Try running your code again. If it still does not work try finding the location of libstdc++.so.6. It is usually in /home/user/anaconda3/lib/ and set the LD_LIBRARY_PATH environment variable.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/user/miniconda3/lib/
I see two different issues
Unsuccessful NVIDIA driver installation
AttributeError: /lib64/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
So would suggest trying to run a simpler example with just some PyTorch model on GPU and seeing if it works
Importing a function that does not exist
I also see
2022-06-19T20:37:26,893 [INFO ] W-9001-my_tc_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'ts.torch_handler.Transformer_handler_generalized'
which is expected atexamples/Huggingface_Transformers/Transformer_handler_generalized.py
but is not ideal and I agree that we should make our handlers easier to import.TODO for us: Make all handlers in
examples
possible to import like soimport ts.torch_handler.Transformer
@msaroufim hi, one more question. How can I import third-party file in some directory when I edit a custom handler. for example, in pytorch_pretrained directory, there are several python files that need to be imported in my custom_handler.py and model.py
in my custom_handler.py and model.py, my import code are as follows:
from pytorch_pretrained import BertModel, BertTokenizer
personal directory contains:
custom_handler.py
model.pt
model.py
pytorch_pretrained(there are several python files in this directory)
torchserve command lines torch-model-archiver --model-name my_text_classifier --version 1.0 --model-file ./zzf_model/model.py --serialized-file ./zzf_model/model.pt --handler "./zzf_model/custom_handler.py" --extra-files "index_to_name_cm.json,source_vocab.pt"
mkdir model_store && mv my_text_classifier.mar model_store/
torchserve --start --model-store model_store --models my_tc=my_text_classifier.mar --ts-config config.properties
No module named 'pytorch_pretrained' finally, I got error ModuleNotFoundError: No module named 'pytorch_pretrained', but I have imported pytorch_pretrained in custom_handler.py and model.py
Hi @rbavery I created a separate issue to track your question
@choshiho regarding your error on no module named pytorch_pretrained
my suspicion is this has something to do with how you're importing the library, is this a local directory file or a full package? In which case maybe you need to do something like from .pytorch_pretrained
or link to the right directory
On the CUDA front it's tricky for me to debug what's going, to be honest I mostly try to do a fresh cloud machine on AWS or use docker containers otherwise I spend 1/2 day each time on CUDA driver problems.
@choshiho regarding your error on
no module named pytorch_pretrained
my suspicion is this has something to do with how you're importing the library, is this a local directory file or a full package? In which case maybe you need to do something likefrom .pytorch_pretrained
or link to the right directoryOn the CUDA front it's tricky for me to debug what's going, to be honest I mostly try to do a fresh cloud machine on AWS or use docker containers otherwise I spend 1/2 day each time on CUDA driver problems.
@msaroufim
My project is on a cloud machine and my package is a directory, I followed your proposed method,
from .pytorch_pretrained import BertModel, BertTokenizer
I also tried
from .pytorch_pretrained.modeling import BertModel
from .pytorch_pretrained.tokenization import BertTokenizer
both led to ImportError: attempted relative import with no known parent package.
personal directory contains: custom_handler.py model.pt model.py pytorch_pretrained(there are several python files in this directory)
Will you please give me some other advice? Thank you in advance.
Can you share a zip file with all your files so I can reproduce?
@msaroufim model.pt I got a model.pt file from this github repo as shown below. https://github.com/649453932/Bert-Chinese-Text-Classification-Pytorch step 1. download pre-trained model bert_Chinese from https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese.tar.gz extract bert_config.json, pytorch_model.bin and put them in the directory Bert-Chinese-Text-Classification-Pytorch-master/bert_pretrain/ step 2. download bert-base-chinese-vocab.txt https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt put vocab.txt in the directory Bert-Chinese-Text-Classification-Pytorch-master/bert_pretrain/ step 3. train and test python run.py --model bert step 4. get a bert.ckpt in the directory 'THUCNews/saved_dict/bert.ckpt' I renamed the fine-tuned model bert.ckpt as model.pt
model.py custom_handler.py the above two files are in the Baidu Net Disk https://pan.baidu.com/s/1aqDBNdZmhKNai2lDy34V1w extract code: ly8l
pytorch_pretrained this directory is in the Bert-Chinese-Text-Classification-Pytorch
🐛 Describe the bug
When I ran the official demo Serving Huggingface Transformers using TorchServe - Sequence Classification, I got the error logs as follows.
Error logs
Installation instructions
JDK11 pip install transformers==4.6.0 TorchServe
Install dependencies
cuda is optional
python ./ts_scripts/install_dependencies.py --cuda=cu111
Latest release
pip install torchserve torch-model-archiver torch-workflow-archiver
Model Packaing
downloaded pre-trained models python Download_Transformer_models.py
config.properties
inference_address=http://127.0.0.1:8083
Versions
Environment headers
Torchserve branch:
torchserve==0.6.0 torch-model-archiver==0.6.0
Python version: 3.9 (64-bit runtime) Python executable: /root/anaconda3/bin/python
Versions of relevant python libraries: captum==0.5.0 future==0.18.2 numpy==1.22.4 numpydoc==1.1.0 nvgpu==0.9.0 psutil==5.9.1 pylint==2.9.6 pytest==6.2.4 pytorch-crf==0.7.2 requests==2.28.0 requests-oauthlib==1.3.0 torch==1.9.0+cu111 torch-model-archiver==0.6.0 torch-workflow-archiver==0.2.4 torchaudio==0.9.0 torchserve==0.6.0 torchtext==0.10.0 torchvision==0.10.0+cu111 transformers==4.20.0.dev0 wheel==0.37.1 torch==1.9.0+cu111 torchtext==0.10.0 torchvision==0.10.0+cu111 torchaudio==0.9.0
Java Version:
OS: N/A GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44) Clang version: N/A CMake version: version 3.14.2
Repro instructions
torch-model-archiver --model-name BERTSeqClassification --version 1.0 --serialized-file Transformer_model/pytorch_model.bin --handler ./Transformer_handler_generalized.py --extra-files "Transformer_model/config.json,./setup_config.json,./Seq_classification_artifacts/index_to_name.json"
mkdir model_store
mv BERTSeqClassification.mar model_store/
torchserve --start --model-store model_store --models my_tc=BERTSeqClassification.mar --ncs --ts-config config.properties
Possible Solution
No response