pointOfive / STA130_F23

Python/JupyterHub implementation of this UofT classic
10 stars 14 forks source link

Created HW9 #12

Closed Bortoise closed 12 months ago

Bortoise commented 1 year ago

I created the homework for week 11. I also updated the Readme with a link to the new HW.

Bortoise commented 1 year ago

There should be an image rendered with the last question, it is in the folder. It should work when downloaded locally. Alternatively I could have it hosted somewhere and just link it to the url, but I prefer this way.

lucieyang1 commented 1 year ago

Hi @Bortoise, can you upload/send the tester file for the homework? Thanks!!

Bortoise commented 1 year ago

Hi @Bortoise, can you upload/send the tester file for the homework? Thanks!!

I'll send it in slack dms, I'm also planning on setting up the markus autotester later today if you want to test that out.

lucieyang1 commented 1 year ago

Ok, I just finished the homework. It took me slightly longer than 2 hours, but I was also trying to learn graphviz and fixing a bunch of issues I ran into. It'd probably be a good idea to add some hints in the questions (like putting the clf.predict() hint in the questions instead of in the tests)

I was a bit confused with the train-test split. Q1 asks to divide the data into train/test, and gives the template train, test = None. Does that actually work? From what I can see, I think X_train, X_test, y_train, y_test should be used instead. Also, we might want to move the train/test dividing into the second question, because I think you need to know the X and y parameters before dividing?

Q4 asks to use the same train/test split as previously, but I'm not too sure that's possible, given that we are using different X parameters and would need to drop different na values. So for Q5, I got different values for both models.

Bortoise commented 1 year ago

The problem I ran into when setting it up the X_train, X_test, y_train, y_test way is that I don't know how you would drop the right rows with .dropna() when dealing with several dataframes at once. On the other hand this is simple to do using test[["life_exp_good", "social_support"]].dropna() in the way I set it up (this is also how Scott does it in the corresponding lecture). Let me know if you know a way around this.

The reason we set up the train-test split before Q2 is that we want to use the same split even when we are making more classifiers with different inputs, so we want to compare apples-to-apples. You are supposed to drop different na values for Q5, hence the setup. I will try to add additional explanation to make this more clear.

lucieyang1 commented 1 year ago

Here's the code I had:

selectfeatures = happiness2017[["social_support", "life_exp_good"]].dropna()
X = selectfeatures[["social_support"]]
y = selectfeatures[["life_exp_good"]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

It seems like you made the split before dropping na values. I dropped the na values first, which is what I think was done in the video. I was thinking, if you drop the na values after making the split, then the split might not reflect a 80/20 split of the data anymore.

Bortoise commented 1 year ago

I see your point about the issue of the 80/20 split. In this case I don't think there is much of a problem since the split being exactly 80/20 isn't really important at all, but it's not best practices in case there were many na entries. However, I really do want them to use the same train-test split for both trees. To remedy this, I'll have them drop the na values for all the relevant cols before doing the train-test split. Thanks!

pointOfive commented 1 year ago

Going to go ahead and add my revisions/updates/commits to the HW and tester files now. This will lose some of the updates made while I was revising files; but, I am testing things on MarkUs now and I think I am aware of what changes might be necessary to get things working correctly on MarkUs.

When reviewing your changes made while I was editing I did not always change to the "hidden" style of your MarkUs output, e.g., some early questions, Q8, etc.: sometimes I decided it was good to expose a little more of the answer (but, I did see those changes you made and generally approved)

There were concerns about setting random seed: at the moment I have not changed my files to reflect this and will be running things on MarkUs to see if I encounter any issues, so I have not made the changes corresponding to our conversation at the following points:

I did add some additional output into the tester file to help with looking into this in the form of print("Answer Submitted:", Q13)

pointOfive commented 1 year ago

print("Answer Submitted:", Q13) did not work -- fixing

pointOfive commented 1 year ago

Running into a 5mb Markus file size constraint due to the graphviz images -- accommodating

pointOfive commented 1 year ago

Once I make these changes in the HW file(s) commit that will eventually appear below, to see my initial changes relative to the your last version of the file (available on GitHub) you'll need to go click on the specific commit and see the changes highlighted in the file. It seems to me this is a fairly good way to get a good view of what's being changed how even thought it's "GitHub texty"

pointOfive commented 1 year ago

Running into this error again: "Cannot display this file. Please download it to view. ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration."

pointOfive commented 1 year ago

The error is caused by graphviz decision tree (XML, I think) figures. The nbconverter to html for rendering in MarkUs chokes with the above warning. Spent a couple hours looking for potential workaround; and, sadly, best option at the moment seems to just be to make clear instructions to students that their final notebooks must not have any figures showing. Very hacky; but, it is something that I can change right now and will be a sufficient placeholder.

pointOfive commented 1 year ago

Correction/Update: exciting news -- MarkUs folks got back to me and nbconverter will not fail in an updated version. Currently it appears it does not render the figures, but this is not important for TA workflow. Very pleased to have sorted this out with relatively minimal time loss. So this will fix itself whenever the next update of MarkUs is released (which will be in time for the Fall semester)

pointOfive commented 1 year ago

I didn't have any issues with the concerns of:

pointOfive commented 1 year ago

Looking for one more confirmatory review from both Matthew and Cole.

Bortoise commented 1 year ago

Ok I did a quick review and I only noticed one major issue. In Q1, the setup was a bit confusing so I moved the explanation about subsetting before dropping NAN values above the line about motivating the choice of 1985 as the random seed.

MatthewYu06 commented 1 year ago

Homework looked good and made sense, took me a little over two hours (but I was also noting edits as I went through)

No major changes

MatthewYu06 commented 1 year ago

also looked at tutorial, looks good so no edits made

pointOfive commented 1 year ago

TUT was very nice. Was happily and pleasantly surprised at how engaging and fun it was. I think the students will likely find this to be very compelling due to the high quality of the examples.