mgrankin / ru_transformers

Apache License 2.0
776 stars 108 forks source link

Unable to use web app #22

Closed nikhilno1 closed 4 years ago

nikhilno1 commented 4 years ago

I am deploying the model as per the instructions, but I am getting either '404 Not found' or 405 'Method not allowed'. What am I doing wrong?

(gpt) nikhil_subscribed@fastai-1:~/ru_transformers$ uvicorn rest:app --reload --host 0.0.0.0
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [6759]
INFO:     TensorFlow version 2.1.0 available.
INFO:     PyTorch version 1.4.0 available.
INFO:     Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
2020-03-08 14:21:07.099055: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2020-03-08 14:21:07.105627: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2020-03-08 14:21:07.106342: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5623f942d910 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-03-08 14:21:07.106381: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-03-08 14:21:07.106523: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
INFO:     127.0.0.1:60018 - "GET /gpt2_poetry/ HTTP/1.1" 405 Method Not Allowed
INFO:     127.0.0.1:60020 - "GET / HTTP/1.1" 404 Not Found
INFO:     127.0.0.1:60030 - "GET /gpt2_poetry HTTP/1.1" 307 Temporary Redirect
INFO:     127.0.0.1:60030 - "GET /gpt2_poetry/ HTTP/1.1" 405 Method Not Allowed
INFO:     127.0.0.1:60040 - "GET /gpt/medium HTTP/1.1" 404 Not Found
nikhilno1 commented 4 years ago

Hi, I am a little confused. I used YTTM tokenizer to create my vocab. It created a yt.model output file. But when I invoke the 'run_generation.py' script it throws error because it cannot find 'vocab.json' & 'merges.txt'. Shouldn't the tokenizer step be generating these files?

(gpt) ubuntu@train-instance:~/ru_transformers$ python run_generation.py     --model_type=gpt2     --model_name_or_path=./gpt2/medium/
2020-03-08 12:59:39.290516: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2020-03-08 12:59:39.298424: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2020-03-08 12:59:39.299943: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x564fe6bc43b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-03-08 12:59:39.299983: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-03-08 12:59:39.300140: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
03/08/2020 12:59:39 - INFO - transformers.tokenization_utils -   Model name './gpt2/medium/' not found in model shortcut name list (gpt2, gpt2-medium, gpt2-large, distilgpt2). Assuming './gpt2/medium/' is a path or url to a directory containing tokenizer files.
03/08/2020 12:59:39 - INFO - transformers.tokenization_utils -   Didn't find file ./gpt2/medium/vocab.json. We won't load it.
03/08/2020 12:59:39 - INFO - transformers.tokenization_utils -   Didn't find file ./gpt2/medium/merges.txt. We won't load it.
03/08/2020 12:59:39 - INFO - transformers.tokenization_utils -   Didn't find file ./gpt2/medium/added_tokens.json. We won't load it.
03/08/2020 12:59:39 - INFO - transformers.tokenization_utils -   Didn't find file ./gpt2/medium/special_tokens_map.json. We won't load it.
03/08/2020 12:59:39 - INFO - transformers.tokenization_utils -   Didn't find file ./gpt2/medium/tokenizer_config.json. We won't load it.
Traceback (most recent call last):
  File "run_generation.py", line 204, in <module>
    main()
  File "run_generation.py", line 166, in main
    tokenizer = tokenizer_class.from_pretrained(args.model_name_or_path)
  File "/home/ubuntu/.anaconda/envs/gpt/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 282, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "/home/ubuntu/.anaconda/envs/gpt/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 346, in _from_pretrained
    list(cls.vocab_files_names.values())))
OSError: Model name './gpt2/medium/' was not found in tokenizers model name list (gpt2, gpt2-medium, gpt2-large, distilgpt2). We assumed './gpt2/medium/' was a path or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url.
nikhilno1 commented 4 years ago

For the Web app part, figured out that we need to make a request as below: curl -d '{"prompt":"This is "}' -H "Content-Type: application/json" -X POST http://0.0.0.0:8000/ Would be good to add it to the README.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.