stefanjwojcik / mm2020

5 stars 0 forks source link

Make this into a tutorial #1

Closed tlienart closed 4 years ago

tlienart commented 4 years ago

Hello!

This is pretty cool :) would you be ok with it becoming part of MLJTutorials ?

stefanjwojcik commented 4 years ago

Hey! Yes! Please add as you see fit.

stefanjwojcik commented 4 years ago

Also happy to add commentary and description as needed.

tlienart commented 4 years ago

Hello Stefan,

It took me a while to finally start looking into this, it's great though I think it may be better to just link to your repo as there's a fair bit of "prep code", more than you'd want to put in a single tutorial (we could do a multi-round tutorial but that would be more work and I'd need your help 😋 ).

One thing though, I downloaded the kaggle data but I couldn't find the initial line "data/MSampleSubmissionStage1_2020.csv" where is this?

Here's the tree of the folder and I can't see that file?

.
├── 2020DataFiles
│   └── 2020DataFiles
│       ├── 2020-Mens-Data
│       │   ├── MDataFiles_Stage1
│       │   │   ├── Cities.csv
│       │   │   ├── Conferences.csv
│       │   │   ├── MConferenceTourneyGames.csv
│       │   │   ├── MGameCities.csv
│       │   │   ├── MMasseyOrdinals.csv
│       │   │   ├── MNCAATourneyCompactResults.csv
│       │   │   ├── MNCAATourneyDetailedResults.csv
│       │   │   ├── MNCAATourneySeedRoundSlots.csv
│       │   │   ├── MNCAATourneySeeds.csv
│       │   │   ├── MNCAATourneySlots.csv
│       │   │   ├── MRegularSeasonCompactResults.csv
│       │   │   ├── MRegularSeasonDetailedResults.csv
│       │   │   ├── MSeasons.csv
│       │   │   ├── MSecondaryTourneyCompactResults.csv
│       │   │   ├── MSecondaryTourneyTeams.csv
│       │   │   ├── MTeamCoaches.csv
│       │   │   ├── MTeamConferences.csv
│       │   │   ├── MTeamSpellings.csv
│       │   │   └── MTeams.csv
│       │   ├── MEvents2015.csv
│       │   ├── MEvents2016.csv
│       │   ├── MEvents2017.csv
│       │   ├── MEvents2018.csv
│       │   ├── MEvents2019.csv
│       │   └── MPlayers.csv
│       └── 2020-Womens-Data
│           ├── WDataFiles_Stage1
│           │   ├── Cities.csv
│           │   ├── Conferences.csv
│           │   ├── WGameCities.csv
│           │   ├── WNCAATourneyCompactResults.csv
│           │   ├── WNCAATourneyDetailedResults.csv
│           │   ├── WNCAATourneySeeds.csv
│           │   ├── WNCAATourneySlots.csv
│           │   ├── WRegularSeasonCompactResults.csv
│           │   ├── WRegularSeasonDetailedResults.csv
│           │   ├── WSeasons.csv
│           │   ├── WTeamConferences.csv
│           │   ├── WTeamSpellings.csv
│           │   └── WTeams.csv
│           ├── WEvents2015.csv
│           ├── WEvents2016.csv
│           ├── WEvents2017.csv
│           ├── WEvents2018.csv
│           ├── WEvents2019.csv
│           └── WPlayers.csv
├── MDataFiles_Stage2
│   ├── Cities.csv
│   ├── Conferences.csv
│   ├── MConferenceTourneyGames.csv
│   ├── MGameCities.csv
│   ├── MMasseyOrdinals.csv
│   ├── MNCAATourneyCompactResults.csv
│   ├── MNCAATourneyDetailedResults.csv
│   ├── MNCAATourneySeedRoundSlots.csv
│   ├── MNCAATourneySeeds.csv
│   ├── MNCAATourneySlots.csv
│   ├── MRegularSeasonCompactResults.csv
│   ├── MRegularSeasonDetailedResults.csv
│   ├── MSeasons.csv
│   ├── MSecondaryTourneyCompactResults.csv
│   ├── MSecondaryTourneyTeams.csv
│   ├── MTeamCoaches.csv
│   ├── MTeamConferences.csv
│   ├── MTeamSpellings.csv
│   └── MTeams.csv
├── MPlayByPlay_Stage2
│   ├── MEvents2015.csv
│   ├── MEvents2016.csv
│   ├── MEvents2017.csv
│   ├── MEvents2018.csv
│   ├── MEvents2019.csv
│   ├── MEvents2020.csv
│   └── MPlayers.csv
├── WDataFiles_Stage2
│   ├── Cities.csv
│   ├── Conferences.csv
│   ├── WGameCities.csv
│   ├── WNCAATourneyCompactResults.csv
│   ├── WNCAATourneyDetailedResults.csv
│   ├── WNCAATourneySeeds.csv
│   ├── WNCAATourneySlots.csv
│   ├── WRegularSeasonCompactResults.csv
│   ├── WRegularSeasonDetailedResults.csv
│   ├── WSeasons.csv
│   ├── WTeamConferences.csv
│   ├── WTeamSpellings.csv
│   └── WTeams.csv
└── WPlayByPlay_Stage2
    ├── WEvents2015.csv
    ├── WEvents2016.csv
    ├── WEvents2017.csv
    ├── WEvents2018.csv
    ├── WEvents2019.csv
    ├── WEvents2020.csv
    └── WPlayers.csv
stefanjwojcik commented 4 years ago

Thibaut,

Yes, there is a lot of preprocessing for the NCAA competitions, especially if one wants to be competitive. Perhaps I could reduce the preprocessing into one fast function call? That would mean cutting out some features, but the goal would be to get straight to the MLJ stuff. I am happy to revise and make a more tutorial-friendly version that you can either link to or post directly.

Regarding the missing submission file - that's weird, I can still see it on the competition page. Either way, you can download it here.

tlienart commented 4 years ago

I think it makes more sense to just link from the tutorials site to your repo, users will find the whole code and I think it's great if you leave all prep-code as it is, that's how 'real' data science looks like :) . Also it means that credit will directly go to you, GitHub stars etc which is fairer.

In short, basically feel free to close this issue, yeah nvm the unfound file, it's fine.

So if you're happy with us linking to your repo I think we're good :) thanks again!

stefanjwojcik commented 4 years ago

Sounds good! Yes, totally fine with you all linking to it.