worldbank / dec-python-course

14 stars 5 forks source link

s3 kb review #25

Closed kbjarkefur closed 10 months ago

kbjarkefur commented 2 years ago

Feedback from @kbjarkefur

kbjarkefur commented 2 years ago

a package is a collection of modules I think the word "module" makes as little sense to someone new to programming as the word "package". So wither explain the word module as well, or replace it with, perhaps, "block of code".

kbjarkefur commented 2 years ago

import the library like you would import a built-in python module e.g. import math, works for common libraries on Google Colab because Colab already pre-installed many common libraries.

Maybe talk a bit more about what import does as we have not explained it before. Perhaps compare it to the custom functions we have written in the past and say that import is like loading a function from a different file.

kbjarkefur commented 2 years ago

Suggestion for darling to drop. Drop these sections:

And replace it with a cell where we have prepared code that just installs anything needed to be installed. Once a participants is at the level that they need something that colab does not have installed already, then they will be able to google how to do that.

matplotlib is already installed in Colab. Is matplotlib==3.5.2 needed after the assert in the first task? Otherwise I suggest dropping this as well.

kbjarkefur commented 2 years ago

If we are not going to use from array import array then do not even bring up Python's array. Keep it as simple as possible.

At most, mention that there is something called array in Python, but for data science it is always NumPy's ndarray that they want.

kbjarkefur commented 2 years ago

Either explain how this string is printed:

print(f'''Array dimension: {one_d_array.ndim},
      shape: {one_d_array.shape}, 
      size: {one_d_array.size},
      data type: {one_d_array.dtype}''')

or replace with syntax the participants know:

print('Array dimension:', one_d_array.ndim)
print('      shape:', one_d_array.shape)
print('      size:', one_d_array.size)
print('      data type:', one_d_array.dtype)
kbjarkefur commented 2 years ago

Suggestion to remove transpose. Is it common enough to need to include here?

If we are including it I think we should use the 2D array as an example as .transpose() does not make any difference on a 1D array. See one_d_array == one_d_array.transpose()

kbjarkefur commented 2 years ago

Suggestion to output the arrays you will use in a section before that section. So in the section "How to do maths and (even) linear algebra with ndarrays?" have a cell with just one_d_array right after the section title. two_d_array is used once so maybe output it again only right before you use it again.

kbjarkefur commented 2 years ago

Excercise 2.3 step two. It is unclear what you want me to use as starting point. Am I starting a new ndarray? Can I make it [7,7] and then just sum that? If I am starting from the array a from last cell, then it is not clear what I am expected to do.

EDIT: Ok, I think you need to explain dot products, we cannot assume any existing knowledge in linear algebra for this course. You could explain, but I think you are already really tight on time, and explaining concepts in linear algebra is not spending that time well

kbjarkefur commented 2 years ago

In section "Universal functions": Suggesting dropping all functions but flip and sort. Perhaps keep greater and any and keep the example where you combine them to show how lists can be compared using the numpy library through a two step process where the first process is an element pairwise comparison.

I do not think that the objective of this section should be to try to convince the participants that numpy can do many advanced things. I think they take our word for it. Therefore the objective should be:

kbjarkefur commented 2 years ago

In the section: "Can I create a DataFrame from an ndarray?"

I understand that raw_values = np.arange(1, 7).reshape(2, -1) is a neat shorthand for you, but I think that the participants of this course will just be confused by that even though you might have shown these methods. I suggest keep it as simple as possible and do raw_values = np.array([[1,2,3],[4,5,6]])

weilu commented 2 years ago

Thank you @kbjarkefur for the feedback!

a package is a collection of modules I think the word "module" makes as little sense to someone new to programming as the word "package". So wither explain the word module as well, or replace it with, perhaps, "block of code".

It was the formal definition. I don't think it needs to be dumbed down. The intuitive / practical explanation followed those definitions.

import the library like you would import a built-in python module e.g. import math, works for common libraries on Google Colab because Colab already pre-installed many common libraries.

Maybe talk a bit more about what import does as we have not explained it before. Perhaps compare it to the custom functions we have written in the past and say that import is like loading a function from a different file.

Good point. Added an explainer on import before as.

Suggestion to output the arrays you will use in a section before that section. So in the section "How to do maths and (even) linear algebra with ndarrays?" have a cell with just one_d_array right after the section title. two_d_array is used once so maybe output it again only right before you use it again.

Good suggestion. Done.

Suggestion to remove transpose. Is it common enough to need to include here?

If we are including it I think we should use the 2D array as an example as .transpose() does not make any difference on a 1D array. See one_d_array == one_d_array.transpose()

Good catch. Updated to use two_d_array and moved to bonus material.

weilu commented 2 years ago

Excercise 2.3 step two. It is unclear what you want me to use as starting point. Am I starting a new ndarray? Can I make it [7,7] and then just sum that? If I am starting from the array a from last cell, then it is not clear what I am expected to do.

EDIT: Ok, I think you need to explain dot products, we cannot assume any existing knowledge in linear algebra for this course. You could explain, but I think you are already really tight on time, and explaining concepts in linear algebra is not spending that time well

Added definition of dot product.

In section "Universal functions": Suggesting dropping all functions but flip and sort. Perhaps keep greater and any and keep the example where you combine them to show how lists can be compared using the numpy library through a two step process where the first process is an element pairwise comparison.

I do not think that the objective of this section should be to try to convince the participants that numpy can do many advanced things. I think they take our word for it. Therefore the objective should be:

  • Numpy can be used on string. Use examples flip and sort.
  • Numpy can be used to do element-pairwise comparison when comapring lists. Use all and greater and combine into np.any(np.greater([1, 2, 3], [1, 2, 1]))

Dropped where (Moved into bonus section). argmax was used later in the pandas section so I think it's good to introduce here.

In the section: "Can I create a DataFrame from an ndarray?"

I understand that raw_values = np.arange(1, 7).reshape(2, -1) is a neat shorthand for you, but I think that the participants of this course will just be confused by that even though you might have shown these methods. I suggest keep it as simple as possible and do raw_values = np.array([[1,2,3],[4,5,6]])

Both arange and reshape have been covered in the previous content by this point. I see this as an opportunity to recap and practice.

weilu commented 2 years ago

Either explain how this string is printed:

print(f'''Array dimension: {one_d_array.ndim},
      shape: {one_d_array.shape}, 
      size: {one_d_array.size},
      data type: {one_d_array.dtype}''')

or replace with syntax the participants know:

print('Array dimension:', one_d_array.ndim)
print('      shape:', one_d_array.shape)
print('      size:', one_d_array.size)
print('      data type:', one_d_array.dtype)

I showed my room string interpolation (f-string) and I was just going to quickly explain multi-line strings here. It didn't take much time and I think both are good to know, especially f-string. In fact, I think the previous sessions should drop format and concatenation with + and teach f-string instead, it's just so much more succinct and less error prone (allows mixed types) – I think I've already suggested in the relevant session's issues.

luisesanmartin commented 1 year ago

@weilu @kbjarkefur are these comments still pending? I understand this was pre-session feedback that was probably addressed for the first edition of the course. If so, please close the issue so we know it's solved. Thanks

kbjarkefur commented 1 year ago

In the end I think it was a mix of pre-session feedback and during. But I will let @weilu decide if there is things she still want to incorporate into the session.

I agree it is good to close and not have lingering before the next iteration of this session

weilu commented 10 months ago

I think all the points have been considered and incorporated when they fit the flow and context of the session. Closing this as complete.