memphis-iis / datawhys-content-notebooks-python

Content for DataWhys in the form of JupyterLab notebooks (.ipynb files)
Apache License 2.0
8 stars 2 forks source link

Content scope/sequence #1

Closed aolney closed 2 years ago

aolney commented 4 years ago

See the OneDrive content-planning folder for resources/documents

Historical note: since we decided just to start authoring notebooks directly, then reverse engineer learning objectives/etc from them, we are using both top-down and bottom-up approaches to this task.

I'm going to keep this as one massive card rather than splitting it up into many little ones. However, I'm going to create cards per notebook authoring task next.

andrewtawfik commented 4 years ago

Had a chance to review the contents of the email sent by @andrewjhampton on 3/26. Overall, I think it looks good. I do have questions as to whether or not the scope is a bit extensive for interns, especially those w/ little experience. I wonder if something like 16-20 might be much depending on how much they feel comfortable. However, if they are clustered together, we can easily drop a module if needed.

My only questions is in the notable absence: "Measures of association: correlation, chi square, etc". I wonder if they need correlation as a baseline to understand regression. It may be worthwhile to touch on as an option, but not necessarily they employ to solve the problem

aolney commented 4 years ago

Thanks @andrewtawfik :thumbsup: To clarify, I think you're talking about the contents of the problem-oriented-topics.xlsx file?

I agree we may want to rethink measures of association. Personally I like a regression-first approach to correlation, so you can explain regression and then explain that correlation is equivalent to regression when the variables are normalized, in which case the intercept is zero.

If we do have a topic on measures of association, we can probably take a similar approach as the plots, i.e. there is a "zoo" of different associations, each with its own trade-offs. We can easily bump off some of the hard models to make space.

aolney commented 4 years ago

@Everyone: Following up on our meeting discussion Monday, please sign up for topics you would like to produce a notebook for here by putting you name in the highlighted column.

I've included everyone in this task so you are notified, but if you don't have expertise on these topics, don't worry about signing up. I will defer signing up because I'm happy to take whatever topics are left over.

Once people have signed up, I will create new tasks for each topic so we can collaborate on them in a more focused way.

ddbowman commented 4 years ago

@aolney: Are the Theory topics set in stone or can we choose our own?

aolney commented 4 years ago

@ddbowman Please propose anything you think would make sense for the interns :)

aolney commented 4 years ago

I've created specific issues/cards for each of the notebooks we've signed up for. I'm happy to help anyone who is unfamiliar with git :smiley_cat:

aolney commented 4 years ago

Quoting email from @vasilerus about the logistic regression notebook, but which has implications for everyone:

1) It might be good to have some common vocabulary/language among all of us developing those notebooks. For instance, should we all call the test labels just that, test labels, or test gold labels to indicate that they are generated by experts, i.e., they are ground truth, as opposed to predicted labels. Not sure if Dale uses that terminology or not. Test_labels vs. test_gold_labels vs. test_groundtruth_labels vs. something else. My point is that we should be consistent across all notebooks if possible. Now or over time

Great point. I've created a new issue to track just terminology discussion, with an associated OneDrive sheet for our decisions

2) The github link to logistic regression is not populated with the blockly illustration.

The blockly part must also be authored. Technically, the code would be generated by blockly. There is not a "decompiler" to transform Python into blocks. I can author the blocks that correspond to your code, using your code as a guide.

Not sure where to upload my notebook now.

Please see this video around minute 3. It shows how you can push your current version to GitHub.

4) For the theory, should I develop powerpoint or embed the theory in the notebook? Should we use Dale's theory slides for consistency with some updates from me depending on what I include in the notebook?

Good question. I've been working under the assumption that we'll put all theory in the notebook itself (keeping it self-contained). Happy to discuss any other possible models.

One more comment: the logistics regression blockly file is actually illustrating a Naive Bayes model in the code.

That's the example I had for the Iris dataset; I'll replace it with blocks for your code once you push the changes.


If there's any more discussion specific to this notebook, let's create a new issue for it to minimize the traffic for others.

vasilerus commented 4 years ago

Still have a question about how to push my notebooks in GitHub. As I noted before, if I open the logistic regression notebook in GitHub it has a blockly example. Am I supposed to overwrite that one with my notebook for logistic regression and only then a blockly version will be created?

On Sun, May 17, 2020 at 2:02 PM Andrew M Olney notifications@github.com wrote:

Quoting email from @vasilerus https://github.com/vasilerus about the logistic regression notebook, but which has implications for everyone:

  1. It might be good to have some common vocabulary/language among all of us developing those notebooks. For instance, should we all call the test labels just that, test labels, or test gold labels to indicate that they are generated by experts, i.e., they are ground truth, as opposed to predicted labels. Not sure if Dale uses that terminology or not. Test_labels vs. test_gold_labels vs. test_groundtruth_labels vs. something else. My point is that we should be consistent across all notebooks if possible. Now or over time

Great point. I've created a new issue to track just terminology discussion https://github.com/memphis-iis/datawhys-content-notebooks/issues/24, with an associated OneDrive sheet for our decisions

  1. The github link to logistic regression is not populated with the blockly illustration.

The blockly part must also be authored. Technically, the code would be generated by blockly. There is not a "decompiler" to transform Python into blocks. I can author the blocks that correspond to your code, using your code as a guide.

Not sure where to upload my notebook now.

Please see this video https://www.youtube.com/watch?v=YqJw-9ahMLw&feature=youtu.be around minute 3. It shows how you can push your current version to GitHub.

  1. For the theory, should I develop powerpoint or embed the theory in the notebook? Should we use Dale's theory slides for consistency with some updates from me depending on what I include in the notebook?

Good question. I've been working under the assumption that we'll put all theory in the notebook itself (keeping it self-contained). Happy to discuss any other possible models.

One more comment: the logistics regression blockly file is actually illustrating a Naive Bayes model in the code.

That's the example I had for the Iris dataset; I'll replace it with blocks for your code once you push the changes.

If there's any more discussion specific to this notebook, let's create a new issue for it to minimize the traffic for others.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/memphis-iis/datawhys-content-notebooks/issues/1#issuecomment-629844689, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIRI3HCMWLGRRAI2XVL4JDRSAYDJANCNFSM4KZTTIIQ .

aolney commented 4 years ago

I think the answer to your question is yes. Here are the steps as I understand them:

  1. You commit your changes using the left git menu in JupyterHub
  2. You push your changes using the up arrow cloud button in that menu
  3. After the push, your logistic regression code will be on GitHub but so will the incorrect blockly code, because they are all together in one notebook
  4. I can pull your logistic regression notebook from GitHub and change the blocks to match your code

Happy to help out on any of these steps :thumbsup:

aolney commented 3 years ago

James has done an analysis of how the 1000+ resources in content-resource-list correspond to our notebooks so far.

I think this is useful for three of the bullet points on this task:

aolney commented 3 years ago

Some updates on this task:

aolney commented 3 years ago

Belatedly adding some notes here from meetings; things to follow up on regarding learning objectives and prerequisite knowledge:

Based on the publications I've seen, I think we could contribute something on this topic.

aolney commented 2 years ago

Closing this as effectively done. If we decide on a follow-on task, e.g. creating a listing of learning objects, I suggest we create a new issue for that.