tysam-code / hlb-gpt

Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in <100 seconds. Scales to larger models with one parameter change (feature currently in alpha).
Apache License 2.0
267 stars 24 forks source link

S3 with wikitext dataset is dead #10

Open snimu opened 6 months ago

snimu commented 6 months ago

When I try to run main.py, I get the following output:

~/hlb-gpt$ python main.py 
downloading data and tokenizing (1-2 min)
Traceback (most recent call last):
  File "/home/ubuntu/hlb-gpt/main.py", line 102, in <module>
    urllib.request.urlretrieve(raw_data_source, raw_data_cache+'data.zip')
  File "/usr/lib/python3.10/urllib/request.py", line 241, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/usr/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

This has been the case for at least several days now, so I assume that the S3 instance is dead.

tysam-code commented 6 months ago

Yes, it looks like salesforce took down the references to Wikitext as well. Let me look into this.