10/21/21 TODOs - Githubissues

nickvazz commented 2 years ago

[x] get github setup
[x] get git setup locally (https://git-scm.com/download/win)
[x] get a colabatory notebook saved in github
[x] go through git workflow
- [x] git add, git commit, git push
[x] load the data in a notebook (MINST dataset) for a simple convolutional model -- https://keras.io/examples/vision/mnist_convnet/
- [x] build a model
- [x] train the model
- [x] evaluate the model
[ ] get the logging of hyper parameters and metrics set up dags hub (MLflow)
[x] load toddler image data

nickvazz commented 2 years ago

If you have issues, you can comment here and its pretty easy to make nice looking text formatting with markdown https://guides.github.com/features/mastering-markdown/

truc-h-nguyen commented 2 years ago

@nickvazz Hi Nick, I have some questions:

May you check for me why I can't put images in the training set into subplot like in the test set? Link
I understand the codes where they prepare data, scale images, change images into 1D array, convert class vectors to binary class matrices (like number 5 we have [0,0,0,0,0,1,0,0,0,0]. I learned a few things about layers in sequential model but not all, like the first layer needs to have input shape; conv2d and maxpooling2d layers downsample image features map; dropout layer helps avoid overfitting model; changing order of layers affects the end result. But I guess I need to read more about Sequential model like one-hot vector and how it relates to the loss function categorical_crossentrophy (if it relates), how to choose loss functions; meaning of each number in layers' output shape; terms like batch size, epochs, activation, optimizer, metrics; way to create validation set, then test training vs validation set.
Can we put our own image of handwritten digits and let the model predicts what digits we have?

nickvazz commented 2 years ago

Hey @truc-h-nguyen, no problem!

1) Looks like in the test set it has a typo that makes it work, and in the train set it doesn't have the same typo but has a small easily fixed error.

Basically `plt.show()` makes the plot and in the `test` case, you never _called_ `plt.show` fully since it requires the `()` to get called. In the `train` case you call it correctly but inside the the most nested loop so it shows after each plot.:

index=0
for nrows in range(3):
  for ncols in range(3):
    plt.subplot(3,3,index+1)
    plt.imshow(x_test[index], cmap="Pastel1")
    index+=1
plt.show()

and

n_rows = 3
n_cols = 3
i=0
for row in range(n_rows):
  for col in range(n_cols):
    plt.subplot(n_rows,n_cols,i+1)
    plt.imshow(x_train[i], cmap='Pastel1')
    i+=1
plt.show()

2) Yep, all those things are true.

The input shape for some models is dynamic which is nice but a lot have a specific hard-coded / optimized shape.
The conv2d doesn't always return a smaller shape. It can return the exact same shape if you change the padding argument when you create a conv2d layer.

padding: one of "valid" or "same" (case-insensitive). "valid" means no padding. "same" results in padding with zeros evenly to the left/right or up/down of the input such that output has the same height/width dimension as the input.
Dropout helps with overfitting during training by randomly removing some node connections so that it's not too dependent on a specific connection but you have to remember to take it out when you actually are using the model for predictions.
The layer ordering (and overall architecture in general) is the name of the game. There is a lot of study into finding the best architecture of which layer should go where with how many nodes and more! There is a python package called autokeras that uses Bayesian optimization (and some genetic algorithms probably) to find the best architecture and optuna (which I have used a bit at work but not for neural nets) to do some network architecture / hyper parameter optimization via bayesian optimization.
The one hot vector is used instead of returning 1,2,3,4,5,6,7,8,9,0 because it wants to make it clear that 7 is not close to 8 or 6. It's also commonly used to the turn words into numbers for Natural Language Processing models.
As for the categorical cross-entropy, it comes from Information Theory and this page gives a pretty good quick understanding of what it's good for.
As for all the definitions, google seems to have a pretty good glossary
adding a training test validation split is fairly easy to do, you can use sklearn to do lots of common preprocessing
- https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split
- there are lots of model selection techniques to try here as well. The easiest one to start with is cross validation

3) We totally can!

from skimage.transform import resize
import matplotlib.image as mpimg

# this is a link to the image above
# you can copy and paste images into comments (while you're making them) and get a free link
img_path = "https://user-images.githubusercontent.com/8573886/138998248-b7753983-a85b-472b-b62d-892fe4be2a8c.png"

img = mpimg.imreadimg_path)

img_resized = resize(img, (28, 28))

print(img_resized.shape, img.shape, x_test[0].shape)

will print

((28, 28, 4), (334, 496, 4), (28, 28, 1))

need to take only one color channel from RGB (or could make it grayscale with some libraries)
single channel by indexing as: img_resized[:,:,0] as the first two : are for the rows and columns of the image
also need to reshape the image into the shape the model is expecting, batch, rows, columns, colors
- this is [1, 28, 28, 1] as seen from the x_test[0].shape
- the -1 used in the front just tells the np.array to do what ever it needs to keep the other 28,28,1 shape correct. If we were to have some np.array that was 3920 items long (28x28x5), doing the .reshape(-1,28,28,1) would results in a shape of 5, 28, 28, 1

model.predict(img_resized[:,:,0].reshape(-1,28,28,1))

Hopefully all that helps!

truc-h-nguyen commented 2 years ago

Thank you for your detailed explanation! It really helps me understand more about the model and related information. I'm learning more about convolutional layers, and coming across this video combining with what you said last Thursday; the topic is getting a lot clearer for me. I'm a bit confused about the convolution kernels like how they are chosen, especially when we have 2 or more convolutional layers, but I guess I'll talk more about my question over the phone.

nickvazz commented 2 years ago

good example of bbox issues: https://github.com/matterport/Mask_RCNN

truc-h-nguyen / Toddler-activity-suggestions

10/21/21 TODOs #5