A set of node scripts that scrape non-NHL hockey leauges for player stats and currently played games.
When adding a scraper for a new league there are a few steps to follow:
season-scraper/leagues
and if it's for a game scraper it goes into game-scraper/leagues
)index.js
inside the same /leagues
folder so the main scraper file can find itLEAGUE_CODES
constant in the index.js
file of the scraper (if it's for a season scrape it goes in season-scraper/index.js
and if it's for a game scraper it goes into game-scraper/index.js
). Use the name of the league that shows up in prospect_info.js
under a prospect's league
key (e.g. KHL, AHL, SHL) and the value should be what you named the scraper in step 2 (e.g. khlScraper, ahlScraper, shlScraper)jest.spyOn
(see other tests on how to accomplish this)For a prospect to have their data scraped it must exist inside prospect_info.js
. Here there is an array of prospect objects that contain all the information necessary to find their stats and upload to the database in a format to be shown to the end user. In order to add a prospect you'll need to add a new object for the desired prospect and make sure it has all these fields:
{
first_name: 'The players first name [string]',
last_name: 'The players last name [string]',
position: 'The players position (LW, RW, C, D) [string]',
shoots: 'The handedness of the player (L, R) [string]',
dob: 'The players date of birth (YYYY-MM-DD format) [string]',
draft_round: 'The round that the player was picked in (set as null if undrafted) [integer]',
draft_pick: 'The exact pick number the player was taken at in their draft (set as null if undrafted) [integer]',
draft_year: 'Year the player was drafted (set as null if undrafted) [integer]',
ep_url: 'The link to the prospects Eliteprospects page which is shown to the user to allow them to have easy access to other seasons beyond the current one being scraped [string]',
}
An example of a filled out prospect looks like this:
{
first_name: 'Timothy',
last_name: 'Liljegren',
position: 'D',
shoots: 'R',
dob: '1999-04-20',
draft_round: 1,
draft_pick: 17,
draft_year: 2017,
ep_url: 'http://www.eliteprospects.com/player.php?player=224910'
}
After adding a prospect object to the prospect_info.js
file (see Adding a Prospect section) you'll need to get the necessary information in order for the scraper to gather the player's season/game data. Depending on the league you'll need to fill out specific fields. Potential fields include:
{
league_id: 'Most leagues require a players unique league_id in order to scrape the prospects [string]',
team_id: 'Some leagues also need the unique id of the team the player plays for in order to find their stats [string]',
season_id: 'Some leagues also need the unique id of the season the prospect is playing in [string]',
league: 'The shorthand version of the current league that the player is playing in so that the scraper knows where to find their stats [string]'
}
Every league is a bit different in how to obtain URLs so there will be a section on how to find the appropriate URL for each league. The needed fields at the beginning of each section will tell you which fields need to be filled out for each league in order for it's scraper to function.
Needed fields:
league_id: '7314',
league: 'AHL',
To get the league_id for the prospect:
Players
button to show the players with the name you just searchedleague_id
field for the prospect. For example, for the following URL https://theahl.com/stats/player/7314/rasmus-sandin
the id is 7314
There is no game scraper for this league
Needed fields:
first_name: 'Jesper',
last_name: 'Lindgren',
team_id: '110b-110bJcIAI'
league: 'Allsv',
For the Allsvenskan you'll need the prospect's first name
, last_name
, and team_id
fields filled out:
team_id
by getting the characters after the &team=
portion. For example the URL https://www.hockeyallsvenskan.se/statistik/spelare?season=2020&gameType=regular&position=All&team=110b-110bJcIAI
the team_id
would be 110b-110bJcIAI
.first_name
and last_name
fields have the same spelling and characters as the page indicates or else the scraper won't find the player.Needed fields:
league_id: '7314',
league: 'BCHL',
To get the league_id for the prospect:
Players
button to show the players with the name you just searchedleague_id
field for the prospect. For example, for the following URL https://bchl.ca/stats/player/6748/ryan-tverberg
the id is 6748
Needed fields:
league_id: '23461',
league: 'CZE',
To get the league_id for the prospect:
league_id
field for the prospect. For example, for the following URL https://www.hokej.cz/hrac/23461
the id is 23461
Needed fields:
league_id: '23461',
league: 'CZE2',
To get the league_id for the prospect:
league_id
field for the prospect. For example, for the following URL https://www.hokej.cz/hrac/23461
the id is 23461
The games scraper for this league can only do day of games
Needed fields:
season_id: '5f4e319b38c0fcf74b12136f',
team_id: '5c5c2fc55ce4ceb584def768',
league_id: '11939bb3d311e552551149a7',
league: 'ECHL',
To get the league_id
for the prospect:
Roster
links3?q=player-<id>
)league_id
field. For example the request https://www.echl.com/api/s3?q=player-69906a5633645b14f186782b.json
would yield a league_id
of 69906a5633645b14f186782b
To get the season_id
and team_id
:
s3?q=schedule
.season_id
and the second is the team_id
. For example, using the URL https://www.echl.com/api/s3?q=schedule-5f4e319b38c0fcf74b12136f-31ffb756ae0a30e567dcf226.json
the season_id
is 5f4e319b38c0fcf74b12136f
and the team_id
is 31ffb756ae0a30e567dcf226
Needed fields:
league_id: '30159',
league: 'KHL',
league_id
. For example, with this URL https://en.khl.ru/players/30159/
the prospect id is 30159
Needed fields:
league_id: '31555838',
league: 'Liiga',
league_id
. For example, in the URL https://liiga.fi/fi/pelaajat/31555838/niemela-topi/
the prospect's id would be 31555838
Needed fields:
league_id: '29969148',
league: 'Mestis',
league_id
. For example, in the URL https://mestis.fi/en/pelaajat/29969148/aalto-santeri
the prospect's id would be 29969148
Needed fields:
league_id: '31214',
league: 'MHL',
This league only needs the prospect's league_id
field to function. To get the id:
league_id
field. For example, Dmitry Ovchinnikov's profile URL is https://engmhl.khl.ru/players/31214/
, therefore his league_id
is 31214
Needed fields:
league_id: '57164',
league: 'NCAA',
This league only needs the prospect's league_id
field to function. To get the id:
Roster
button to load the team's roster/career/
and paste it into the player's league_id
field. For example, Ryan Tverberg's profile URL is http://collegehockeyinc.com/players/career/57164/
, therefore his league_id
is 57164
There is no games scraper for this league
Needed fields:
first_name: 'Denis',
last_name: 'Malgin',
season_id: '3092',
team_id: '101151',
league: 'NL',
season_id
and team_id
which you can get with second and third numbers after the &filterQuery=
portion of the URL. For example: https://www.sihf.ch/de/game-center/national-league#/players/points/desc/page/0/2021/3478/101151
would have a season_id
of 3478
and a team_id
of 101151
.Needed fields:
league_id: '7662',
league: 'OHL',
league_id
field. For example, with the URL https://ontariohockeyleague.com/players/7662
the player's id would be 7662
Needed fields:
league_id: '17871',
league: 'QMJHL',
league_id
field. For example, with the URL https://theqmjhl.ca/players/17871
the player's id would be 17871
Needed fields:
league_id: 255011063073080359893401,
league: 'Sarja20',
This scraper uses the prospect's league_id
field to determine the URL so all you need is to find the player's id and it will work. To do that:
http://www.leijonat.fi/index.php/pelaajat?lkq=255011063073080359893401
and therefore his league_id
would be: 255011063073080359893401
Needed fields:
team_id: '1a71-1a71gTHKh__lulea-hockey',
league_id: 'qRm-1ykhbTRK4__filip-hallander',
league: 'SHL',
In order to get the team_id
and league_id
you'll need to:
Lag
(team) filter to the team that the prospect plays forStatistik
heading beside the profile picture to go to the statistics tableteam_id
from the text between the first /
characters after lag
(has the prospect's team name in it) as well as the league_id
from the text between the next set of /
characters (has the prospect's name in it). For example, the URL https://www.shl.se/lag/1a71-1a71gTHKh__lulea-hockey/qRm-1ykhbTRK4__filip-hallander/statistics
has a team_id
of 1a71-1a71gTHKh__lulea-hockey
and a league_id
of qRm-1ykhbTRK4__filip-hallander
Needed fields:
league_id: '7842',
league: 'USHL',
In order to get the league_id
for the prospect you'll need to:
All Teams
dropdown then hit submit)https://www.ushl.com/view#/player/8956/73/cole-burtch
this URL has a player id of 8956
. The 73
is the season id.Needed fields:
league_id: '25697',
league: 'VHL',
In order to get a VHL prospect to scrape properly you'll need the league_id
field filled out:
league_id
. The league_id
are the numbers between the /
characters after the players
portion. For example, the URL http://www.vhlru.ru/en/players/25697/
has a league_id
of 25697
Needed fields:
league_id: '27355',
league: 'WHL',
league_id
field. For example, with the URL https://whl.ca/players/27355
the player's id would be 27355
Needed fields:
league_id: '3126259',
team_id: 'RUS', // Options: RUS, SVK, SUI, GER, FIN, RUS, USA, SWE, CZE, AUT, CAN (if team name is not listed here check IIHF Games page for which abbreviation is used)
league: 'WJC',
Teams
page using the menu<tr>
for the table. It is listed under the data-fwk-id
tag. For instance, Mikhail Abramov's id is 3126259
To remove a prospect from the system you'll need to:
prospect_info.js
so it is not scraped again. npm run seasons:reset
to wipe the database and re-scrape the prospect's season statlines.For the most part you can just copy a browser console query of the body object:
document.querySelector('body')
Copy Object
(in Firefox, Chrome may be different)__fixtures__
folder (within the __tests__
folder you created for the module tests)module.exports = `BODY_CODE_GOES_INSIDE_STRING`
To run the Twitter bot you need to:
.env
file has all of these filled out:
TWITTER_CONSUMER_KEY='<CONSUMER_KEY_HERE>'
TWITTER_CONSUMER_SECRET='<CONSUMER_SECRET_HERE>'
TWITTER_ACCESS_TOKEN_KEY='<ACCESS_TOKEN_HERE>'
TWITTER_ACCESS_TOKEN_SECRET='<ACCESS_TOKEN_SECRET_HERE>'
GAMES_FE_URL='localhost:3000/games' // Page for the puppeteer browser to go to and get the content
npm run twitter:games-recap
By default you can find the Leafs development Twitter bot at @leafsprospects2
. Anytime you use the run command in the development environment it will post