mcekovic / tennis-crystal-ball

Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Apache License 2.0
250 stars 35 forks source link

data-loader issue #324

Closed PaulSteffen-betclic closed 5 years ago

PaulSteffen-betclic commented 5 years ago

Hi,

I followed installation instructions on issue #232 but I ran into a few issues. The first 9 steps described by @mcekovic are ok but the load at the 10th step fail :

Loading Tennis Data Allocating DB connections............... Loading players Loading file 'C:\tennis_atp\atp_players.csv' Exception in thread "main" java.lang.NumberFormatException: For input string: "player_id"

Loading file 'C:\tennis_atp\atp_matches_1969.csv' Exception in thread "main" groovy.lang.MissingMethodException: No signature of method: static org.strangeforest.tcb.dataload.MatchLoader.mapLevel() is applicable for argument types: (String, null, String, Short, String) values: [A, null, Hobart, 1968, 713] Possible solutions: mapLevel(java.lang.String, short, java.lang.String, int, java.lang.String)

Loading file 'C:\tennis_atp\atp_matches_1971.csv' Invalid set: 6-Feb java.lang.NumberFormatException: For input string: "Feb"

which didn't stop the loading.

Fetching tournament URL 'http://www.atptour.com/en/scores/archive/birmingham/350/1970//results' Unknown tournament level: null Exception in thread "main" java.text.ParseException: Unparseable date: "-"

mcekovic commented 5 years ago

Seems there was recently massive update in the Jeff's repository, fixing bugs and updating the data. However, seems another layer of issues is introduced :(. I will be able to check the data and eventually adapt UTS loaders for the new changes only in August.

mcekovic commented 5 years ago

In the meanwhile, you could pull the Jeff's repository at point in time Sep 26, 2018 (last commit b1a180f124c65708a65e09ba593b86ba8f13bf79), i.e. reverting last 4 commits from May 2019 and starting the data load.

PaulSteffen-betclic commented 5 years ago

Thx. I'm looking forward to your fixes :) I tried to use b1a180f124c65708a65e09ba593b86ba8f13bf79 version of Jeff's repository, but the load fail with _atp_rankings00s.csv file with the following Exception :

Exception in thread "main" java.lang.NumberFormatException: For input string: "bioTableWrap bioTableWrapAlt">"

mcekovic commented 5 years ago

You need to unfortunately edit local files and correct the errors:

atp_rankings_00s.csv: -20070212,1535,104756,"bioTableWrap bioTableWrapAlt"">" +20070212,1535,104756,1

atp_rankings_10s.csv: -20160613,1709,,>

PaulSteffen-betclic commented 5 years ago

Thx ! But I have always the same final Exception :

Fetching tournament URL 'http://www.atptour.com/en/scores/archive/birmingham/350/1970//results' Unknown tournament level: null Exception in thread "main" java.text.ParseException: Unparseable date: "-"

mcekovic commented 5 years ago

Now ATP website has issues in the generated HTML. Unfortunately, I will be able to adapt UTS for it only in August.

mcekovic commented 5 years ago

Data load should been fixed now, meaning it should complete, but the data quality is not guaranteed (see https://github.com/JeffSackmann/tennis_atp/issues/108)