when using cloud workstations or cloud shell, the (bash) terminal doesn't get the contents of the $ACCESS_TOKEN before sending the cURL request. The result is always Your client does not have permission to the requested URL <code>/v1/chat/completions</code>. which might be difficult to debug since the Access Token is valid. I believe using double-quotes here works fine for other shells.
The minimum requirements for GPUs is 4 CPU and 16 memory. I've updated the gcloud command and clarified the description to call out how this is the minimum requirement.
A couple of additional suggestions:
Your client does not have permission to the requested URL <code>/v1/chat/completions</code>.
which might be difficult to debug since the Access Token is valid. I believe using double-quotes here works fine for other shells.