jondurbin / bagel

A bagel, with everything.
300 stars 31 forks source link

Error following readme to prepare datasets #3

Closed Creative-Emporium closed 5 months ago

Creative-Emporium commented 6 months ago

I am attempting to replicate your bagel-7b-v0.1 model. I tried running python -m bagel.data I ran a test pointing the natural instructions script to pegah-a/small-natural-instructions to check if i reproduce the same error as with the original Muennighoff/natural-instructions The outcome was the same for both datasets. I have updated datasets and the transformers packages

2024-01-15 10:55:32.135 | INFO | bagel.data_sources.natural_instructions:load_data:12 - Loading Natural Instructions train split... Resolving data files: 0%| | 1/1514 [00:32<13:50:27, 32.93s/it] Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/CreativeEmporium/BAGEL/lib/python3.10/site-packages/requests/adapters.py", line 517, in send
raise SSLError(e, request=request) requests.exceptions.SSLError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/datasets/Muennighoff/natural-instructions/revision/a29a9757125f4bb1c26445ad0d2ef7d9b2cc9c4c (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)')))"), '(Request ID: 68b4311c-3d4b-49ec-ac66-da567b64129d)')

Any advice or pointing me in the right direction would be highly appreciated.

Below is the error from the redirection test
2024-01-15 11:38:40.751 | INFO | bagel.data_sources.natural_instructions:load_data:12 - Loading Natural Instructions train split... Resolving data files: 0%| | 1/1514 [00:33<14:07:25, 33.61s/it] Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) requests.exceptions.SSLError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/datasets/pegah-a/small-natural-instructions/revision/2d4942a5caf0d751554f815059a0919077c78b4e (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)')))"), '(Request ID: a8461337-a641-418a-b351-62c353d27ad5)')

jondurbin commented 6 months ago

instructions/revision/2d4942a5caf0d751554f815059a0919077c78b4e (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)')))"), '(Request ID: a8461337-a641-418a-b351-62c353d27ad5)')

Unfortunately this appears to be either a network connectivity issue to huggingface. The natural instructions dataset is quite large, so any interruption can cause problems.

Creative-Emporium commented 5 months ago

@jondurbin Thank you i tried to redownload on a completely different network and it worked.. Thanks