johnchoiniere / pfx_parser

A new PitchFX parser, written in python
GNU General Public License v3.0
26 stars 20 forks source link

404 Error in pfx_parser_csv.py #3

Open ByronBecker opened 6 years ago

ByronBecker commented 6 years ago

I'm currently running your script with a small change (deleting the last "/" from the url in lines 80 & 81, as it has changed), and am getting this error at line 98

Traceback (most recent call last):
  File "pfx_parser_csv.py", line 98, in <module>
    if BeautifulSoup(urlopen(g_url),"lxml").find("a", href="game.xml"):
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

Any idea why this might be happening?

johnchoiniere commented 6 years ago

Unfortunately, yes -- AFAIK, MLBAM has, as of this year, taken down the old gameday XML, for both new and past games. Don't think this will ever work again, and I'm not currently in a position to try to search out where, if anywhere, the data are now available to update it.

nleut commented 6 years ago

In addition to removing the last "/" on line 80 and 81, add it back on line 93 by changing the line to g_url = d_url+"/"+g. This has it working for me, at least for 2018

ByronBecker commented 6 years ago

@nleut tested and works