Closed tlienart closed 4 years ago
Hey! Yes! Please add as you see fit.
Also happy to add commentary and description as needed.
Hello Stefan,
It took me a while to finally start looking into this, it's great though I think it may be better to just link to your repo as there's a fair bit of "prep code", more than you'd want to put in a single tutorial (we could do a multi-round tutorial but that would be more work and I'd need your help 😋 ).
One thing though, I downloaded the kaggle data but I couldn't find the initial line "data/MSampleSubmissionStage1_2020.csv"
where is this?
Here's the tree of the folder and I can't see that file?
.
├── 2020DataFiles
│  └── 2020DataFiles
│  ├── 2020-Mens-Data
│  │  ├── MDataFiles_Stage1
│  │  │  ├── Cities.csv
│  │  │  ├── Conferences.csv
│  │  │  ├── MConferenceTourneyGames.csv
│  │  │  ├── MGameCities.csv
│  │  │  ├── MMasseyOrdinals.csv
│  │  │  ├── MNCAATourneyCompactResults.csv
│  │  │  ├── MNCAATourneyDetailedResults.csv
│  │  │  ├── MNCAATourneySeedRoundSlots.csv
│  │  │  ├── MNCAATourneySeeds.csv
│  │  │  ├── MNCAATourneySlots.csv
│  │  │  ├── MRegularSeasonCompactResults.csv
│  │  │  ├── MRegularSeasonDetailedResults.csv
│  │  │  ├── MSeasons.csv
│  │  │  ├── MSecondaryTourneyCompactResults.csv
│  │  │  ├── MSecondaryTourneyTeams.csv
│  │  │  ├── MTeamCoaches.csv
│  │  │  ├── MTeamConferences.csv
│  │  │  ├── MTeamSpellings.csv
│  │  │  └── MTeams.csv
│  │  ├── MEvents2015.csv
│  │  ├── MEvents2016.csv
│  │  ├── MEvents2017.csv
│  │  ├── MEvents2018.csv
│  │  ├── MEvents2019.csv
│  │  └── MPlayers.csv
│  └── 2020-Womens-Data
│  ├── WDataFiles_Stage1
│  │  ├── Cities.csv
│  │  ├── Conferences.csv
│  │  ├── WGameCities.csv
│  │  ├── WNCAATourneyCompactResults.csv
│  │  ├── WNCAATourneyDetailedResults.csv
│  │  ├── WNCAATourneySeeds.csv
│  │  ├── WNCAATourneySlots.csv
│  │  ├── WRegularSeasonCompactResults.csv
│  │  ├── WRegularSeasonDetailedResults.csv
│  │  ├── WSeasons.csv
│  │  ├── WTeamConferences.csv
│  │  ├── WTeamSpellings.csv
│  │  └── WTeams.csv
│  ├── WEvents2015.csv
│  ├── WEvents2016.csv
│  ├── WEvents2017.csv
│  ├── WEvents2018.csv
│  ├── WEvents2019.csv
│  └── WPlayers.csv
├── MDataFiles_Stage2
│  ├── Cities.csv
│  ├── Conferences.csv
│  ├── MConferenceTourneyGames.csv
│  ├── MGameCities.csv
│  ├── MMasseyOrdinals.csv
│  ├── MNCAATourneyCompactResults.csv
│  ├── MNCAATourneyDetailedResults.csv
│  ├── MNCAATourneySeedRoundSlots.csv
│  ├── MNCAATourneySeeds.csv
│  ├── MNCAATourneySlots.csv
│  ├── MRegularSeasonCompactResults.csv
│  ├── MRegularSeasonDetailedResults.csv
│  ├── MSeasons.csv
│  ├── MSecondaryTourneyCompactResults.csv
│  ├── MSecondaryTourneyTeams.csv
│  ├── MTeamCoaches.csv
│  ├── MTeamConferences.csv
│  ├── MTeamSpellings.csv
│  └── MTeams.csv
├── MPlayByPlay_Stage2
│  ├── MEvents2015.csv
│  ├── MEvents2016.csv
│  ├── MEvents2017.csv
│  ├── MEvents2018.csv
│  ├── MEvents2019.csv
│  ├── MEvents2020.csv
│  └── MPlayers.csv
├── WDataFiles_Stage2
│  ├── Cities.csv
│  ├── Conferences.csv
│  ├── WGameCities.csv
│  ├── WNCAATourneyCompactResults.csv
│  ├── WNCAATourneyDetailedResults.csv
│  ├── WNCAATourneySeeds.csv
│  ├── WNCAATourneySlots.csv
│  ├── WRegularSeasonCompactResults.csv
│  ├── WRegularSeasonDetailedResults.csv
│  ├── WSeasons.csv
│  ├── WTeamConferences.csv
│  ├── WTeamSpellings.csv
│  └── WTeams.csv
└── WPlayByPlay_Stage2
├── WEvents2015.csv
├── WEvents2016.csv
├── WEvents2017.csv
├── WEvents2018.csv
├── WEvents2019.csv
├── WEvents2020.csv
└── WPlayers.csv
Thibaut,
Yes, there is a lot of preprocessing for the NCAA competitions, especially if one wants to be competitive. Perhaps I could reduce the preprocessing into one fast function call? That would mean cutting out some features, but the goal would be to get straight to the MLJ stuff. I am happy to revise and make a more tutorial-friendly version that you can either link to or post directly.
Regarding the missing submission file - that's weird, I can still see it on the competition page. Either way, you can download it here.
I think it makes more sense to just link from the tutorials site to your repo, users will find the whole code and I think it's great if you leave all prep-code as it is, that's how 'real' data science looks like :) . Also it means that credit will directly go to you, GitHub stars etc which is fairer.
In short, basically feel free to close this issue, yeah nvm the unfound file, it's fine.
So if you're happy with us linking to your repo I think we're good :) thanks again!
Sounds good! Yes, totally fine with you all linking to it.
Hello!
This is pretty cool :) would you be ok with it becoming part of MLJTutorials ?