msegala / Kaggle-National_Data_Science_Bowl

15 stars 8 forks source link

kaggle- National Data Science Bowl

Code used for competing in National Data Science Bowl ( The final solution used Convolutional Neural Networks.

Generating the solution

Install the dependicies

Dependices are found in requirements.txt. I created a even folded where these libraries are stored.

Generating Training and Testing images

create Data_converted/train/ by running python

create Data_converted/test/ by running python

Create Final Dataset

Within Fish Bowl.ipynb run steps 1 & 2 to create the training and testing set needed as inputs to the CNNs.

Pretrain the network

We can perform unsupervised pre-training on the network by running the exact network used for training BUT we use regression and the labels are the same as the input features. The pre-training is ran with the full test + train set and the weights of the network are saved into a pickled object. These weights are then used to initialize the true training network.

python2.7 fit


Train the network

To train the best single mode, run:

python2.7 fit

This will create a pickled object net-specialists.pickle which contains the neccessary weights to create predictions.

Generate augmented predictions

To generate predictions which are averaged across multiple transformations of the input, run:

python2.7 predict

This will create multiple a csv file with predictions for each test set observation

Single Model predictions

To generate predictions for a single model run step 3 within Fish Bowl.ipynb

Blended augmented predictions

To generate predictions for a multiple models averaged together run step 4 within Fish Bowl.ipynb

Train and Predict all models

In the end I trained 8 different models, to train and predict all of these at once run ./ and ./

Lessons Learned

Throughout the competition I had repeated issues with data augmentation, I was only able to achieve good results with the rotation of [0,90,180,270]. The background in the images is white (255) and opencv/scikit-image assume by default that it is black (0). Therefore, we can invert the images with im = np.invert(im) when loading in the images.