rustygentile / Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time
Other
4 stars 2 forks source link

ISSUE: create_model.py issue #2

Open Benjmeadows opened 4 years ago

Benjmeadows commented 4 years ago

Perhaps it is a permissions issue with AWS or perhaps I am missing something but I believe that create_model.py is supposed to deploy an initial model to S3 so that I can train it. When I run the install script, which calls create_model.py, and then try to run the example jupyter notebook I get the following errors on the transform block that starts the job:

`ClientError Traceback (most recent call last) ~/anaconda3/envs/amazonei_tensorflow2_p36/lib/python3.6/site-packages/sagemaker/transformer.py in _retrieve_image_name(self) 238 model_desc = self.sagemaker_session.sagemaker_client.describe_model( --> 239 ModelName=self.model_name 240 )

~/anaconda3/envs/amazonei_tensorflow2_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs) 315 # The "self" in this scope is referring to the BaseClient. --> 316 return self._make_api_call(operation_name, kwargs) 317

~/anaconda3/envs/amazonei_tensorflow2_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params) 634 error_class = self.exceptions.from_code(error_code) --> 635 raise error_class(parsed_response, operation_name) 636 else:

ClientError: An error occurred (ValidationException) when calling the DescribeModel operation: Could not find model "arn:aws:sagemaker:us-east-1:525578525493:model/voice-cloning-recall".

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)

in 1 # Start the job. It should take several minutes. Although most of that is from starting the container. ----> 2 trans.transform(f's3://{bucket_name}/sample_job.json', content_type='application/json') 3 trans.wait() ~/anaconda3/envs/amazonei_tensorflow2_p36/lib/python3.6/site-packages/sagemaker/transformer.py in transform(self, data, data_type, content_type, compression_type, split_type, job_name, input_filter, output_filter, join_source, experiment_config, model_client_config, wait, logs) 193 194 if base_name is None: --> 195 base_name = self._retrieve_base_name() 196 197 self._current_job_name = name_from_base(base_name) ~/anaconda3/envs/amazonei_tensorflow2_p36/lib/python3.6/site-packages/sagemaker/transformer.py in _retrieve_base_name(self) 226 def _retrieve_base_name(self): 227 """Placeholder docstring""" --> 228 image_name = self._retrieve_image_name() 229 230 if image_name: ~/anaconda3/envs/amazonei_tensorflow2_p36/lib/python3.6/site-packages/sagemaker/transformer.py in _retrieve_image_name(self) 254 "Failed to fetch model information for %s. " 255 "Please ensure that the model exists. " --> 256 "Local instance types require locally created models." % self.model_name 257 ) 258 ValueError: Failed to fetch model information for voice-cloning-recall. Please ensure that the model exists. Local instance types require locally created models.` Looking at the S3 bucket I don't see a model named "voice-cloning-recall" so I am pretty sure this is the issue. The create_model.py appears to be where the issue is?
rustygentile commented 4 years ago

Hi @Benjmeadows . Thanks for providing all the details. "voice-cloning-recall" won't appear in your S3 bucket. But there should be a file named "pretrained.tar.gz".

Try the following:

  1. Sign into the AWS console: aws.amazon.com
  2. Search for SageMaker
  3. Under Inference -> Models you should find "voice-cloning-recall"

image

If you don't have a model present, you can try the "Create Model" button. Then for "Location of model artifacts" choose your S3 url to "pretrained.tar.gz". And for "Location of inference code image" pick the "real-time-voice-cloning" image that was created by the install script (assuming those steps from the script ran successfully). Here's what mine looks like:

image

If this doesn't work, it'd be helpful to see if there are any error messages you got from running the install script.

Benjmeadows commented 4 years ago

Thanks a lot for your help! I haven't been able to get back to this until now.

On the positive side: I followed your instructions and I am not getting that error anymore. However, I get a new error when I try to run the transform block of the notebook:

# Start the job. It should take several minutes. Although most of that is from starting the container.
trans.transform(f's3://{bucket_name}/sample_job.json', content_type='application/json')
trans.wait()

The error I am getting is:

..................................2020/08/29 22:47:41 [notice] 8#8: using the "epoll" event method
2020/08/29 22:47:41 [notice] 8#8: nginx/1.14.0 (Ubuntu)
2020/08/29 22:47:41 [notice] 8#8: OS: Linux 4.14.186-110.268.amzn1.x86_64
2020/08/29 22:47:41 [notice] 8#8: getrlimit(RLIMIT_NOFILE): 65536:99999
2020/08/29 22:47:41 [notice] 8#8: start worker processes
2020/08/29 22:47:41 [notice] 8#8: start worker process 10
2020/08/29 22:47:41 [crit] 10#10: *1 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 169.254.255.130, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "169.254.255.131:8080"
169.254.255.130 - - [29/Aug/2020:22:47:41 +0000] "GET /ping HTTP/1.1" 502 182 "-" "Go-http-client/1.1"
2020/08/29 22:47:41 [crit] 10#10: *3 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 169.254.255.130, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "169.254.255.131:8080"
169.254.255.130 - - [29/Aug/2020:22:47:41 +0000] "GET /ping HTTP/1.1" 502 182 "-" "Go-http-client/1.1"
[2020-08-29 22:47:41 +0000] [9] [INFO] Starting gunicorn 20.0.4
[2020-08-29 22:47:41 +0000] [9] [INFO] Listening at: unix:/tmp/gunicorn.sock (9)
[2020-08-29 22:47:41 +0000] [9] [INFO] Using worker: gevent
[2020-08-29 22:47:41 +0000] [13] [INFO] Booting worker with pid: 13
[2020-08-29 22:47:41 +0000] [14] [INFO] Booting worker with pid: 14
169.254.255.130 - - [29/Aug/2020:22:47:47 +0000] "GET /ping HTTP/1.1" 200 1 "-" "Go-http-client/1.1"
169.254.255.130 - - [29/Aug/2020:22:47:47 +0000] "GET /execution-parameters HTTP/1.1" 404 2 "-" "Go-http-client/1.1"
2020/08/29 22:47:47 [info] 10#10: *7 client 169.254.255.130 closed keepalive connection
[2020-08-29 22:47:47,188] WARNING in process_voice: !!!!! No GPU found !!!!!
[2020-08-29 22:47:47,188] WARNING in process_voice: Starting request: test...
[2020-08-29 22:47:50,218] ERROR in app: Exception on /invocations [POST]
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/urllib3/connection.py", line 159, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/opt/conda/lib/python3.7/site-packages/urllib3/util/connection.py", line 80, in create_connection
    raise err
  File "/opt/conda/lib/python3.7/site-packages/urllib3/util/connection.py", line 70, in create_connection
    sock.connect(sa)
  File "/opt/conda/lib/python3.7/site-packages/gevent/_socket3.py", line 428, in connect
    raise error(result, strerror(result))
OSError: [Errno 101] Network is unreachable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/botocore/httpsession.py", line 263, in send
    chunked=self._chunked(request.headers),
  File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/opt/conda/lib/python3.7/site-packages/urllib3/util/retry.py", line 344, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/opt/conda/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
    raise value
  File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/opt/conda/lib/python3.7/http/client.py", line 1244, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/conda/lib/python3.7/site-packages/botocore/awsrequest.py", line 92, in _send_request
    method, url, body, headers, *args, **kwargs)
  File "/opt/conda/lib/python3.7/http/client.py", line 1290, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/opt/conda/lib/python3.7/http/client.py", line 1239, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/conda/lib/python3.7/site-packages/botocore/awsrequest.py", line 119, in _send_output
    self.send(msg)
  File "/opt/conda/lib/python3.7/site-packages/botocore/awsrequest.py", line 203, in send
    return super(AWSConnection, self).send(str)
  File "/opt/conda/lib/python3.7/http/client.py", line 966, in send
    self.connect()
  File "/opt/conda/lib/python3.7/site-packages/urllib3/connection.py", line 181, in connect
    conn = self._new_conn()
  File "/opt/conda/lib/python3.7/site-packages/urllib3/connection.py", line 168, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPConnection object at 0x7f78592eecd0>: Failed to establish a new connection: [Errno 101] Network is unreachable

During handling of the above exception, another exception occurred:

[2020-08-29 22:47:50,218] ERROR in app: Exception on /invocations [POST]
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/urllib3/connection.py", line 159, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/opt/conda/lib/python3.7/site-packages/urllib3/util/connection.py", line 80, in create_connection
    raise err
  File "/opt/conda/lib/python3.7/site-packages/urllib3/util/connection.py", line 70, in create_connection
    sock.connect(sa)
  File "/opt/conda/lib/python3.7/site-packages/gevent/_socket3.py", line 428, in connect
    raise error(result, strerror(result))
OSError: [Errno 101] Network is unreachable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/botocore/httpsession.py", line 263, in send
    chunked=self._chunked(request.headers),
  File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/opt/conda/lib/python3.7/site-packages/urllib3/util/retry.py", line 344, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/opt/conda/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
    raise value
  File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/opt/conda/lib/python3.7/http/client.py", line 1244, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/conda/lib/python3.7/site-packages/botocore/awsrequest.py", line 92, in _send_request
    method, url, body, headers, *args, **kwargs)
  File "/opt/conda/lib/python3.7/http/client.py", line 1290, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/opt/conda/lib/python3.7/http/client.py", line 1239, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/conda/lib/python3.7/site-packages/botocore/awsrequest.py", line 119, in _send_output
    self.send(msg)
  File "/opt/conda/lib/python3.7/site-packages/botocore/awsrequest.py", line 203, in send
    return super(AWSConnection, self).send(str)
  File "/opt/conda/lib/python3.7/http/client.py", line 966, in send
    self.connect()
  File "/opt/conda/lib/python3.7/site-packages/urllib3/connection.py", line 181, in connect
    conn = self._new_conn()
  File "/opt/conda/lib/python3.7/site-packages/urllib3/connection.py", line 168, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPConnection object at 0x7f78592eecd0>: Failed to establish a new connection: [Errno 101] Network is unreachable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/botocore/utils.py", line 1669, in _get_response
    response = self._session.send(request.prepare())
  File "/opt/conda/lib/python3.7/site-packages/botocore/httpsession.py", line 283, in send
    raise EndpointConnectionError(endpoint_url=request.url, error=e)
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://169.254.170.2/v2/credentials/LwIpUB5S3wNg4ofv6o2lqWo5FTfAbqrTFBH_U4HMcjE"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1856, in fetch_creds
    full_uri, headers=headers)
  File "/opt/conda/lib/python3.7/site-packages/botocore/utils.py", line 1619, in retrieve_full_uri
    return self._retrieve_credentials(full_url, headers)
  File "/opt/conda/lib/python3.7/site-packages/botocore/utils.py", line 1656, in _retrieve_credentials
    full_url, headers, self.TIMEOUT_SECONDS)
  File "/opt/conda/lib/python3.7/site-packages/botocore/utils.py", line 1687, in _get_response
    raise MetadataRetrievalError(error_msg=error_msg)
botocore.exceptions.MetadataRetrievalError: Error retrieving metadata: Received error when attempting to retrieve ECS metadata: Could not connect to the endpoint URL: "http://169.254.170.2/v2/credentials/LwIpUB5S3wNg4ofv6o2lqWo5FTfAbqrTFBH_U4HMcjE"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/opt/conda/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/opt/program/process_voice.py", line 76, in transformation
    s3 = boto3.client('s3')
  File "/opt/conda/lib/python3.7/site-packages/boto3/__init__.py", line 91, in client
    return _get_default_session().client(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/boto3/session.py", line 263, in client
    aws_session_token=aws_session_token, config=config)
  File "/opt/conda/lib/python3.7/site-packages/botocore/session.py", line 826, in create_client
    credentials = self.get_credentials()
  File "/opt/conda/lib/python3.7/site-packages/botocore/session.py", line 431, in get_credentials
    'credential_provider').load_credentials()
  File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1962, in load_credentials
    creds = provider.load()
  File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1825, in load
    return self._retrieve_or_fail()
  File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1834, in _retrieve_or_fail
    creds = fetcher()
  File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1861, in fetch_creds
    error_msg=str(e))
botocore.exceptions.CredentialRetrievalError: Error when retrieving credentials from container-role: Error retrieving metadata: Received error when attempting to retrieve ECS metadata: Could not connect to the endpoint URL: "http://169.254.170.2/v2/credentials/LwIpUB5S3wNg4ofv6o2lqWo5FTfAbqrTFBH_U4HMcjE"
169.254.255.130 - - [29/Aug/2020:22:47:50 +0000] "POST /invocations HTTP/1.1" 500 290 "-" "Go-http-client/1.1"
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/botocore/utils.py", line 1669, in _get_response
    response = self._session.send(request.prepare())
  File "/opt/conda/lib/python3.7/site-packages/botocore/httpsession.py", line 283, in send
    raise EndpointConnectionError(endpoint_url=request.url, error=e)
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://169.254.170.2/v2/credentials/LwIpUB5S3wNg4ofv6o2lqWo5FTfAbqrTFBH_U4HMcjE"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1856, in fetch_creds
    full_uri, headers=headers)
  File "/opt/conda/lib/python3.7/site-packages/botocore/utils.py", line 1619, in retrieve_full_uri
    return self._retrieve_credentials(full_url, headers)
  File "/opt/conda/lib/python3.7/site-packages/botocore/utils.py", line 1656, in _retrieve_credentials
    full_url, headers, self.TIMEOUT_SECONDS)
  File "/opt/conda/lib/python3.7/site-packages/botocore/utils.py", line 1687, in _get_response
    raise MetadataRetrievalError(error_msg=error_msg)
botocore.exceptions.MetadataRetrievalError: Error retrieving metadata: Received error when attempting to retrieve ECS metadata: Could not connect to the endpoint URL: "http://169.254.170.2/v2/credentials/LwIpUB5S3wNg4ofv6o2lqWo5FTfAbqrTFBH_U4HMcjE"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/opt/conda/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/opt/program/process_voice.py", line 76, in transformation
    s3 = boto3.client('s3')
  File "/opt/conda/lib/python3.7/site-packages/boto3/__init__.py", line 91, in client
    return _get_default_session().client(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/boto3/session.py", line 263, in client
    aws_session_token=aws_session_token, config=config)
  File "/opt/conda/lib/python3.7/site-packages/botocore/session.py", line 826, in create_client
    credentials = self.get_credentials()
  File "/opt/conda/lib/python3.7/site-packages/botocore/session.py", line 431, in get_credentials
    'credential_provider').load_credentials()
  File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1962, in load_credentials
    creds = provider.load()
  File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1825, in load
    return self._retrieve_or_fail()
  File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1834, in _retrieve_or_fail
    creds = fetcher()
  File "/opt/conda/lib/python3.7/site-packages/botocore/credentials.py", line 1861, in fetch_creds
    error_msg=str(e))
botocore.exceptions.CredentialRetrievalError: Error when retrieving credentials from container-role: Error retrieving metadata: Received error when attempting to retrieve ECS metadata: Could not connect to the endpoint URL: "http://169.254.170.2/v2/credentials/LwIpUB5S3wNg4ofv6o2lqWo5FTfAbqrTFBH_U4HMcjE"
169.254.255.130 - - [29/Aug/2020:22:47:50 +0000] "POST /invocations HTTP/1.1" 500 290 "-" "Go-http-client/1.1"
2020-08-29T22:47:47.133:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD
2020-08-29T22:47:50.228:[sagemaker logs]: voicefilesfortraining/sample_job.json: Bad HTTP status received from algorithm: 500
2020-08-29T22:47:50.228:[sagemaker logs]: voicefilesfortraining/sample_job.json: 
2020-08-29T22:47:50.228:[sagemaker logs]: voicefilesfortraining/sample_job.json: Message:
2020-08-29T22:47:50.228:[sagemaker logs]: voicefilesfortraining/sample_job.json: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
2020-08-29T22:47:50.228:[sagemaker logs]: voicefilesfortraining/sample_job.json: <title>500 Internal Server Error</title>
2020-08-29T22:47:50.228:[sagemaker logs]: voicefilesfortraining/sample_job.json: <h1>Internal Server Error</h1>
2020-08-29T22:47:50.228:[sagemaker logs]: voicefilesfortraining/sample_job.json: <p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>

I am using the latest model: image