serve-and-volley / atp-world-tour-tennis-data

Using Python to scrape ATP World Tour tennis data
193 stars 109 forks source link

Way to directly use their API #35

Open gabjauf opened 1 year ago

gabjauf commented 1 year ago

Hello @serve-and-volley!

Just to tell you I really like what you did on the scrapping of this humongous website, that's a lot of work so I appreciate.

I recently found a way to use directly their API by decoding the encrypted data, I just thought you might be interested as it could make scrapping a bit easier and less a burden to maintain as long as they do not change the cipher method (not relying on the interface anymore): https://stackoverflow.com/questions/73735401/scraping-an-atptour-com-api-returns-what-looks-like-encrypted-data/75086660#75086660

If I may be of any help, please reach 😉

Cheers

BG2011 commented 1 year ago

would love to see more insights on this to be honest. I also want to create an app that scrapes the info but i only need the Players stats.

I also scouted the network tab but didnt see any reference towards this api you are refercing.

Would love to make a project on this (for me it will be in javascript then )

gabjauf commented 1 year ago

For example: The match https://www.atptour.com/en/scores/stats-centre/archive/2023/404/ms002

You can access the data here (found in the network tab): https://itp-atp-sls.infosys-platforms.com/prod/api/match-beats/status/year/2023/eventId/404/matchId/MS002

Content is encrypted but can be decrypted thanks to the method I outlined in my previous comment.

Only issue I see is that you don't get this kind of data on the old games.

BG2011 commented 1 year ago

Hi,

Thank you for the answers. Problem is for example: https://www.atptour.com/en/players/carlos-alcaraz/a0e2/player-stats

i can not seem to find the API call in the network. Could you help me if you have some spare time ?( if you could see the reference towards the API ?

gabjauf commented 1 year ago

Hey,

Could not find it either, not sure it exists thought...

serve-and-volley commented 1 year ago

Hello @serve-and-volley!

Just to tell you I really like what you did on the scrapping of this humongous website, that's a lot of work so I appreciate.

I recently found a way to use directly their API by decoding the encrypted data, I just thought you might be interested as it could make scrapping a bit easier and less a burden to maintain as long as they do not change the cipher method (not relying on the interface anymore): https://stackoverflow.com/questions/73735401/scraping-an-atptour-com-api-returns-what-looks-like-encrypted-data/75086660#75086660

If I may be of any help, please reach 😉

Cheers

Hi @gabjauf, super thanks for the info! I am wondering if you can provide the Python version of your Stack Overflow comment?

// const CryptoJS = require("crypto-js"); // If Nodejs

const data = {"lastModified":1663265556422,"response":"hlXzkPyyhwUYql2Nwl/3AAcRSsZHKf5LyqsAHqSWjP+ZHzfdmQ7bG2cOrf3YxwcZFIlsJNLJOSL/dSj/fFtjWHkeQd21inSUPOkbu2hSD2xMxEkyss8rOIVJAx6NmY9sap852VtmTc2CT4TdXXRduEK4fXASReIX3Eb9V+TMs24t5ow6w8aau+GWZLP9b32ALs4IZeea+dE3YcKtYrZOu/bV7ZLSawlontkgGN9s4QSjUhv43ifxkS6oDHGFkh+4pjjqfLDa2c0fA28otRZUF4uz+UvYAW2b9hZxBVJQU0E45Bf/myuQjZ14KtQr0NdxAMq53PZlki2hRVtnCDErA2e26cK9/bkC6Pz/J0N7rosTYw6TtDRGPYeqM3z645Uew3f3vEcSQLkWWxi1txQPxTbn1MT4HzRtnAbGJOF+GeaAKbwtSt2B86iHjkyEJ+ssmIMsARRjUmhdFmsMF6vuqA5pSgxvYTacg/yzZvy6HVhZBqTpPcaRJGt41efib3zQg8u++yKXdz8MnHicuz32w/osWzcMsC3Cwm5/a1tJZ48xFJdu8YgUsFS6ioNaO9V6vWz8imQZiPEZxd1FLfRynjS8LpvY3+83M2h+A0oExmcd4UaEMCqkklM1A7ssOXeDTqKS8UiZVM3zH6lzNI42QOZE+WYcPvwNzVLanJpZcKqlLupGfOiHuUclEwKrBL8h3wHtU6UmU+VoPJQM82b4pv5vJY/qlUgjLnaWk18A5UV9MF2b81iI3T8i4U8KGeovMhVLdq7YRZFdBG9djQgPRzwfofB/LRz5+aTwKwiTTsmvy4DMP/2iCB7Eiqr7OaKtuaj1n6vt2MdIstqTz/nDEkjLcdrspajdqHnTfUYLEVJvns6KPIKQaQ61I71G7vkEG4MtZ3PRgGy7/zR/B2qAzhaJmHYMZtOfE2OPcPXi3wi9tTYObYaGzpQIqkFGUtpa862bq8qMSXVUpfb8dvDTOyuvURD9FmSHeDHiO6DYhqxqQrfw1aRHK0vu6QcSsGF31vYnrRGR48nZgouqyzUv90Nc9hvyXBcEaYZpCG2qbAArBseD+RRtXeWV1yvV+C7oy68JOxgLJaL1AsLPX81WV9maPy2Ns3IJ64iNvKMebWFtETNtDPIs5amm+wFjERiQ85DK70wucEd3lWWQr7UddSO8U72whJXGbtsC2onskI75uLF3n7XX4goaHrj0IVB3kVqc4O1zMXWvCzype2EerR2E9K/qoBWh5PQRc4bPhrNdoYGSAh18AKtzVOqPgNgzXnW591r4pWMrWW8Tww89sayPZUnxOwDIaf6kFP74+34K+ZWKGVJA9YBPpKfGAfMgOYalnB7YMA4Tn4Hmt4OQtPeArwgR4DBW+HiQ+aFNK04="};

function decode(data) {
  var e = formatDate(new Date(data.lastModified))
    , n = CryptoJS.enc.Utf8.parse(e)
    , r = CryptoJS.enc.Utf8.parse(e.toUpperCase())
    , i = CryptoJS.AES.decrypt(data.response, n, {
      iv: r,
      mode: CryptoJS.mode.CBC,
      padding: CryptoJS.pad.Pkcs7
    });
  return JSON.parse(i.toString(CryptoJS.enc.Utf8))
};

function formatDate(t) {
  var e = (new Date).getTimezoneOffset(), n = new Date(t.getTime() + 60 * e * 1e3).getDate(), r = parseInt((n < 10 ? "0" + n : n).toString().split("").reverse().join("")), i = t.getFullYear(), a = parseInt(i.toString().split("").reverse().join("")), o = parseInt(t.getTime().toString(), 16).toString(36) + ((i + a) * (n + r)).toString(24), s = o.length;
  if (s < 14)
    for (var c = 0; c < 14 - s; c++)
      o += "0";

  else
    s > 14 && (o = o.substr(0, 14));
  return "#" + o + "$";
}

console.log(decode(data));

Any help would be greatly appreciated!

serve-and-volley commented 1 year ago

@gabjauf Using NodeJS and your JS code, for the following link:

I get the following result:

{
  courtId: 1,
  courtName: null,
  allStats: true,
  isDoubles: false,
  eventId: 404,
  eventType: "Men's Singles",
  isMatchComplete: true,
  matchId: 'MS002',
  maxRally: 23,
  matchStatus: 'C',
  matchWinner: 1,
  playerData: {
    tm1Ply1CountryName: 'spain',
    tm1Ply1Country: 'ESP',
    tm1Ply1Id: 'A0E2',
    tm1Ply1Name: 'C. ALCARAZ',
    tm1Ply1FirstName: 'Carlos',
    tm1Ply1LastName: 'ALCARAZ',
    tm1Ply2CountryName: null,
    tm1Ply2Country: null,
    tm1Ply2Id: null,
    tm1Ply2Name: null,
    tm1Ply2FirstName: null,
    tm1Ply2LastName: null,
    tm1Seed: '1',
    tm1WinProb: 0,
    tm2Ply1CountryName: 'italy',
    tm2Ply1Country: 'ITA',
    tm2Ply1Id: 'S0AG',
    tm2Ply1Name: 'J. SINNER',
    tm2Ply1FirstName: 'Jannik',
    tm2Ply1LastName: 'SINNER',
    tm2Ply2CountryName: null,
    tm2Ply2Country: null,
    tm2Ply2Id: null,
    tm2Ply2Name: null,
    tm2Ply2FirstName: null,
    tm2Ply2LastName: null,
    tm2Seed: '11',
    tm2WinProb: 0
  },
  rallyStats: true,
  setsComplete: 2,
  setData: [
    {
      gamesComplete: 13,
      gameData: [Array],
      set: 1,
      isSetComplete: true,
      setDuration: 61,
      setMaxRally: 15,
      setName: 'SET 1',
      setWinner: 1
    },
    {
      gamesComplete: 9,
      gameData: [Array],
      set: 2,
      isSetComplete: true,
      setDuration: 50,
      setMaxRally: 23,
      setName: 'SET 2',
      setWinner: 1
    }
  ],
  year: 2023
}

However the match stats I'm interested in seems to be inside this array:

gameData: [Array]

Wondering if you would know how to access it?

CarterT27 commented 1 year ago

Wondering if you would know how to access it?

The data is all there, try printing the full object using JSON.stringify, or if you just want to access the gameData, you can try accessing the gameData directly from the data object.

console.log(JSON.stringify(decode(data), null, 4)); OR console.log(decode(data).setData[0].gameData) console.log(decode(data).setData[1].gameData)

https://stackoverflow.com/questions/10729276/how-can-i-get-the-full-object-in-node-jss-console-log-rather-than-object

glad94 commented 1 year ago

Hi @serve-and-volley , I managed to "pythonise" @gabjauf's solution a while back and have been using it to collect my own data. Just got around to organising the code into a repository here. Feel free to try it out or incorporate it. :smile: