kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku
Apache License 2.0
6.29k stars 892 forks source link

Fine-tuning #188

Closed preste-naava closed 2 years ago

preste-naava commented 2 years ago

Hi! While finetuning gptj I always have the same google.api exception. I have uploaded the pretrained weights and dataset to a bucket, created Cloud TPU Service Account and gave read and write permission on the bucket. But it didn't help(( Any help will be appreciated.

saving a checkpoint for step 1 Traceback (most recent call last): File "device_train.py", line 58, in save with open(f"gs://{bucket}/{path}/meta.json", "r") as f: File "/home/preste-naava/.local/lib/python3.8/site-packages/smart_open/smart_open_lib.py", line 235, in open binary = _open_binary_stream(uri, binary_mode, transport_params) File "/home/preste-naava/.local/lib/python3.8/site-packages/smart_open/smart_open_lib.py", line 398, in _open_binary_stream fobj = submodule.open_uri(uri, mode, transport_params) File "/home/preste-naava/.local/lib/python3.8/site-packages/smart_open/gcs.py", line 105, in open_uri return open(parsed_uri['bucket_id'], parsed_uri['blob_id'], mode, **kwargs) File "/home/preste-naava/.local/lib/python3.8/site-packages/smart_open/gcs.py", line 138, in open fileobj = Reader( File "/home/preste-naava/.local/lib/python3.8/site-packages/smart_open/gcs.py", line 224, in init raise google.cloud.exceptions.NotFound('blob %s not found in %s' % (key, bucket)) google.api_core.exceptions.NotFound: 404 blob mesh_jax_gpt_6B_eliza_rotary/meta.json not found in gpt-j_bucket

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/preste-naava/.local/lib/python3.8/site-packages/google/cloud/storage/blob.py", line 2713, in create_resumable_uploadsession upload, = self._initiate_resumable_upload( File "/home/preste-naava/.local/lib/python3.8/site-packages/google/cloud/storage/blob.py", line 1916, in _initiate_resumable_upload upload.initiate( File "/home/preste-naava/.local/lib/python3.8/site-packages/google/resumable_media/requests/upload.py", line 413, in initiate self._process_initiate_response(response) File "/home/preste-naava/.local/lib/python3.8/site-packages/google/resumable_media/_upload.py", line 502, in _process_initiate_response _helpers.require_status_code( File "/home/preste-naava/.local/lib/python3.8/site-packages/google/resumable_media/_helpers.py", line 99, in require_status_code raise common.InvalidResponse( google.resumable_media.common.InvalidResponse: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.CREATED: 201>)

ljj430 commented 2 years ago

Hi, have you solved this error? I encountered the same one.