Open Benjmeadows opened 4 years ago
Hi @Benjmeadows . Thanks for providing all the details. "voice-cloning-recall" won't appear in your S3 bucket. But there should be a file named "pretrained.tar.gz".
Try the following:
If you don't have a model present, you can try the "Create Model" button. Then for "Location of model artifacts" choose your S3 url to "pretrained.tar.gz". And for "Location of inference code image" pick the "real-time-voice-cloning" image that was created by the install script (assuming those steps from the script ran successfully). Here's what mine looks like:
If this doesn't work, it'd be helpful to see if there are any error messages you got from running the install script.
Thanks a lot for your help! I haven't been able to get back to this until now.
On the positive side: I followed your instructions and I am not getting that error anymore. However, I get a new error when I try to run the transform block of the notebook:
# Start the job. It should take several minutes. Although most of that is from starting the container.
trans.transform(f's3://{bucket_name}/sample_job.json', content_type='application/json')
trans.wait()
The error I am getting is:
..................................2020/08/29 22:47:41 [notice] 8#8: using the "epoll" event method
2020/08/29 22:47:41 [notice] 8#8: nginx/1.14.0 (Ubuntu)
2020/08/29 22:47:41 [notice] 8#8: OS: Linux 4.14.186-110.268.amzn1.x86_64
2020/08/29 22:47:41 [notice] 8#8: getrlimit(RLIMIT_NOFILE): 65536:99999
2020/08/29 22:47:41 [notice] 8#8: start worker processes
2020/08/29 22:47:41 [notice] 8#8: start worker process 10
2020/08/29 22:47:41 [crit] 10#10: *1 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 169.254.255.130, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "169.254.255.131:8080"
169.254.255.130 - - [29/Aug/2020:22:47:41 +0000] "GET /ping HTTP/1.1" 502 182 "-" "Go-http-client/1.1"
2020/08/29 22:47:41 [crit] 10#10: *3 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 169.254.255.130, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "169.254.255.131:8080"
169.254.255.130 - - [29/Aug/2020:22:47:41 +0000] "GET /ping HTTP/1.1" 502 182 "-" "Go-http-client/1.1"
[2020-08-29 22:47:41 +0000] [9] [INFO] Starting gunicorn 20.0.4
[2020-08-29 22:47:41 +0000] [9] [INFO] Listening at: unix:/tmp/gunicorn.sock (9)
[2020-08-29 22:47:41 +0000] [9] [INFO] Using worker: gevent
[2020-08-29 22:47:41 +0000] [13] [INFO] Booting worker with pid: 13
[2020-08-29 22:47:41 +0000] [14] [INFO] Booting worker with pid: 14
169.254.255.130 - - [29/Aug/2020:22:47:47 +0000] "GET /ping HTTP/1.1" 200 1 "-" "Go-http-client/1.1"
169.254.255.130 - - [29/Aug/2020:22:47:47 +0000] "GET /execution-parameters HTTP/1.1" 404 2 "-" "Go-http-client/1.1"
2020/08/29 22:47:47 [info] 10#10: *7 client 169.254.255.130 closed keepalive connection
[2020-08-29 22:47:47,188] WARNING in process_voice: !!!!! No GPU found !!!!!
[2020-08-29 22:47:47,188] WARNING in process_voice: Starting request: test...
[2020-08-29 22:47:50,218] ERROR in app: Exception on /invocations [POST]
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/opt/conda/lib/python3.7/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/opt/conda/lib/python3.7/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
File "/opt/conda/lib/python3.7/site-packages/gevent/_socket3.py", line 428, in connect
raise error(result, strerror(result))
OSError: [Errno 101] Network is unreachable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/botocore/httpsession.py", line 263, in send
chunked=self._chunked(request.headers),
File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/opt/conda/lib/python3.7/site-packages/urllib3/util/retry.py", line 344, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/conda/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/opt/conda/lib/python3.7/http/client.py", line 1244, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/opt/conda/lib/python3.7/site-packages/botocore/awsrequest.py", line 92, in _send_request
method, url, body, headers, *args, **kwargs)
File "/opt/conda/lib/python3.7/http/client.py", line 1290, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/opt/conda/lib/python3.7/http/client.py", line 1239, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/opt/conda/lib/python3.7/site-packages/botocore/awsrequest.py", line 119, in _send_output
self.send(msg)
File "/opt/conda/lib/python3.7/site-packages/botocore/awsrequest.py", line 203, in send
return super(AWSConnection, self).send(str)
File "/opt/conda/lib/python3.7/http/client.py", line 966, in send
self.connect()
File "/opt/conda/lib/python3.7/site-packages/urllib3/connection.py", line 181, in connect
conn = self._new_conn()
File "/opt/conda/lib/python3.7/site-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPConnection object at 0x7f78592eecd0>: Failed to establish a new connection: [Errno 101] Network is unreachable
During handling of the above exception, another exception occurred:
[2020-08-29 22:47:50,218] ERROR in app: Exception on /invocations [POST]
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/opt/conda/lib/python3.7/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/opt/conda/lib/python3.7/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
File "/opt/conda/lib/python3.7/site-packages/gevent/_socket3.py", line 428, in connect
raise error(result, strerror(result))
OSError: [Errno 101] Network is unreachable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/botocore/httpsession.py", line 263, in send
chunked=self._chunked(request.headers),
File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/opt/conda/lib/python3.7/site-packages/urllib3/util/retry.py", line 344, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/conda/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/opt/conda/lib/python3.7/http/client.py", line 1244, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/opt/conda/lib/python3.7/site-packages/botocore/awsrequest.py", line 92, in _send_request
method, url, body, headers, *args, **kwargs)
File "/opt/conda/lib/python3.7/http/client.py", line 1290, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/opt/conda/lib/python3.7/http/client.py", line 1239, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/opt/conda/lib/python3.7/site-packages/botocore/awsrequest.py", line 119, in _send_output
self.send(msg)
File "/opt/conda/lib/python3.7/site-packages/botocore/awsrequest.py", line 203, in send
return super(AWSConnection, self).send(str)
File "/opt/conda/lib/python3.7/http/client.py", line 966, in send
self.connect()
File "/opt/conda/lib/python3.7/site-packages/urllib3/connection.py", line 181, in connect
conn = self._new_conn()
File "/opt/conda/lib/python3.7/site-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPConnection object at 0x7f78592eecd0>: Failed to establish a new connection: [Errno 101] Network is unreachable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/botocore/utils.py", line 1669, in _get_response
response = self._session.send(request.prepare())
File "/opt/conda/lib/python3.7/site-packages/botocore/httpsession.py", line 283, in send
raise EndpointConnectionError(endpoint_url=request.url, error=e)
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://169.254.170.2/v2/credentials/LwIpUB5S3wNg4ofv6o2lqWo5FTfAbqrTFBH_U4HMcjE"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1856, in fetch_creds
full_uri, headers=headers)
File "/opt/conda/lib/python3.7/site-packages/botocore/utils.py", line 1619, in retrieve_full_uri
return self._retrieve_credentials(full_url, headers)
File "/opt/conda/lib/python3.7/site-packages/botocore/utils.py", line 1656, in _retrieve_credentials
full_url, headers, self.TIMEOUT_SECONDS)
File "/opt/conda/lib/python3.7/site-packages/botocore/utils.py", line 1687, in _get_response
raise MetadataRetrievalError(error_msg=error_msg)
botocore.exceptions.MetadataRetrievalError: Error retrieving metadata: Received error when attempting to retrieve ECS metadata: Could not connect to the endpoint URL: "http://169.254.170.2/v2/credentials/LwIpUB5S3wNg4ofv6o2lqWo5FTfAbqrTFBH_U4HMcjE"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/opt/conda/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/opt/program/process_voice.py", line 76, in transformation
s3 = boto3.client('s3')
File "/opt/conda/lib/python3.7/site-packages/boto3/__init__.py", line 91, in client
return _get_default_session().client(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/boto3/session.py", line 263, in client
aws_session_token=aws_session_token, config=config)
File "/opt/conda/lib/python3.7/site-packages/botocore/session.py", line 826, in create_client
credentials = self.get_credentials()
File "/opt/conda/lib/python3.7/site-packages/botocore/session.py", line 431, in get_credentials
'credential_provider').load_credentials()
File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1962, in load_credentials
creds = provider.load()
File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1825, in load
return self._retrieve_or_fail()
File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1834, in _retrieve_or_fail
creds = fetcher()
File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1861, in fetch_creds
error_msg=str(e))
botocore.exceptions.CredentialRetrievalError: Error when retrieving credentials from container-role: Error retrieving metadata: Received error when attempting to retrieve ECS metadata: Could not connect to the endpoint URL: "http://169.254.170.2/v2/credentials/LwIpUB5S3wNg4ofv6o2lqWo5FTfAbqrTFBH_U4HMcjE"
169.254.255.130 - - [29/Aug/2020:22:47:50 +0000] "POST /invocations HTTP/1.1" 500 290 "-" "Go-http-client/1.1"
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/botocore/utils.py", line 1669, in _get_response
response = self._session.send(request.prepare())
File "/opt/conda/lib/python3.7/site-packages/botocore/httpsession.py", line 283, in send
raise EndpointConnectionError(endpoint_url=request.url, error=e)
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://169.254.170.2/v2/credentials/LwIpUB5S3wNg4ofv6o2lqWo5FTfAbqrTFBH_U4HMcjE"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1856, in fetch_creds
full_uri, headers=headers)
File "/opt/conda/lib/python3.7/site-packages/botocore/utils.py", line 1619, in retrieve_full_uri
return self._retrieve_credentials(full_url, headers)
File "/opt/conda/lib/python3.7/site-packages/botocore/utils.py", line 1656, in _retrieve_credentials
full_url, headers, self.TIMEOUT_SECONDS)
File "/opt/conda/lib/python3.7/site-packages/botocore/utils.py", line 1687, in _get_response
raise MetadataRetrievalError(error_msg=error_msg)
botocore.exceptions.MetadataRetrievalError: Error retrieving metadata: Received error when attempting to retrieve ECS metadata: Could not connect to the endpoint URL: "http://169.254.170.2/v2/credentials/LwIpUB5S3wNg4ofv6o2lqWo5FTfAbqrTFBH_U4HMcjE"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/opt/conda/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/opt/program/process_voice.py", line 76, in transformation
s3 = boto3.client('s3')
File "/opt/conda/lib/python3.7/site-packages/boto3/__init__.py", line 91, in client
return _get_default_session().client(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/boto3/session.py", line 263, in client
aws_session_token=aws_session_token, config=config)
File "/opt/conda/lib/python3.7/site-packages/botocore/session.py", line 826, in create_client
credentials = self.get_credentials()
File "/opt/conda/lib/python3.7/site-packages/botocore/session.py", line 431, in get_credentials
'credential_provider').load_credentials()
File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1962, in load_credentials
creds = provider.load()
File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1825, in load
return self._retrieve_or_fail()
File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1834, in _retrieve_or_fail
creds = fetcher()
File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1861, in fetch_creds
error_msg=str(e))
botocore.exceptions.CredentialRetrievalError: Error when retrieving credentials from container-role: Error retrieving metadata: Received error when attempting to retrieve ECS metadata: Could not connect to the endpoint URL: "http://169.254.170.2/v2/credentials/LwIpUB5S3wNg4ofv6o2lqWo5FTfAbqrTFBH_U4HMcjE"
169.254.255.130 - - [29/Aug/2020:22:47:50 +0000] "POST /invocations HTTP/1.1" 500 290 "-" "Go-http-client/1.1"
2020-08-29T22:47:47.133:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD
2020-08-29T22:47:50.228:[sagemaker logs]: voicefilesfortraining/sample_job.json: Bad HTTP status received from algorithm: 500
2020-08-29T22:47:50.228:[sagemaker logs]: voicefilesfortraining/sample_job.json:
2020-08-29T22:47:50.228:[sagemaker logs]: voicefilesfortraining/sample_job.json: Message:
2020-08-29T22:47:50.228:[sagemaker logs]: voicefilesfortraining/sample_job.json: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2020-08-29T22:47:50.228:[sagemaker logs]: voicefilesfortraining/sample_job.json: <title>500 Internal Server Error</title>
2020-08-29T22:47:50.228:[sagemaker logs]: voicefilesfortraining/sample_job.json: <h1>Internal Server Error</h1>
2020-08-29T22:47:50.228:[sagemaker logs]: voicefilesfortraining/sample_job.json: <p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>
I am using the latest model:
Perhaps it is a permissions issue with AWS or perhaps I am missing something but I believe that create_model.py is supposed to deploy an initial model to S3 so that I can train it. When I run the install script, which calls create_model.py, and then try to run the example jupyter notebook I get the following errors on the transform block that starts the job:
`ClientError Traceback (most recent call last) ~/anaconda3/envs/amazonei_tensorflow2_p36/lib/python3.6/site-packages/sagemaker/transformer.py in _retrieve_image_name(self) 238 model_desc = self.sagemaker_session.sagemaker_client.describe_model( --> 239 ModelName=self.model_name 240 )
~/anaconda3/envs/amazonei_tensorflow2_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs) 315 # The "self" in this scope is referring to the BaseClient. --> 316 return self._make_api_call(operation_name, kwargs) 317
~/anaconda3/envs/amazonei_tensorflow2_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params) 634 error_class = self.exceptions.from_code(error_code) --> 635 raise error_class(parsed_response, operation_name) 636 else:
ClientError: An error occurred (ValidationException) when calling the DescribeModel operation: Could not find model "arn:aws:sagemaker:us-east-1:525578525493:model/voice-cloning-recall".
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)