joenano / rpscrape

Scrape horse racing results data and racecards.
140 stars 57 forks source link

orjson.JSONDecodeError: trailing characters #116

Closed patem2 closed 2 years ago

patem2 commented 2 years ago

Hi,

When trying to run the racecards.py utility, the following error is prevalent:

Traceback (most recent call last): File "racecards.py", line 443, in main() File "racecards.py", line 433, in main races = parse_races(session, race_urls, date) File "racecards.py", line 330, in parse_races runners = get_runners(session, profile_urls) File "racecards.py", line 127, in get_runners js = loads(json_str) orjson.JSONDecodeError: trailing characters at line 1 column 5218: line 1 column 5219 (char 5218)

Best wishes Mark

rmwesley99 commented 2 years ago

Same issue as above, but unfortunately well beyond my capabilities to figure out what is going on.

patem2 commented 2 years ago

It looks similar to the last code amendment which fixed JSON string, its the same area of code that's generating the error:

https://github.com/joenano/rpscrape/commit/4ff98034952a0892dcf358be15c5a74dd40c7f3d

Like you though, well beyond my capabilities :-)

joenano commented 2 years ago

Should be fixed.

rmwesley99 commented 2 years ago

All working here.

You are a scholar and a gent (/ lady).

I'll do a diff on the files to try and learn some of the magic.

Many thanks again for all you do here.

joenano commented 2 years ago

Your welcome. The fix was just changing what character to split on.

There is a json string inside the html that can be parsed out. This is not a good design by the website but its quite common and it suits us for scraping.