This POC is using CNTK 2.1 to train model for multiclass classification of images. Our model is able to recognize specific objects (i.e. toilet, tap, sink, bed, lamp, pillow) connected with picture types we are looking for. It plays a big role in a process which will be used to classify pictures from different hotels and determine whether it's a picture of bathroom, bedroom, hotel front, swimming pool, bar, etc. That final classification will be made based on objects that were detected in those pictures.
What can you find inside:
If you would like to know how to use such model, you can check this project to find out how to write a simple RESTfull, Python-based web service and deploy it to Azure Web Apps with your own model.
Disclaimer: This POC and all the learnings you can find bellow is an outcome of close cooperation between Microsoft and Hotailors. Our combined team spent total of 3 days to prepare and label data, finetune parameters and train the model.
Due to limited time and human resources we decided to create this POC for just 2 of almost 20 different types of pictures we would like to classify in final product
Each type of picture (i.e. bedroom, bathroom, bar, lobby, hotel front, restaurant
) can consists of different objects (i.e. toilet, sink, tap, towell, bed, lamp, curtain, pillow
) which are strongly connected with that speciifc picture type.
For our POC we used 2 picture types with 4 objects/classes per each:
bedroom | bathroom |
---|---|
pillow | tap |
bed | sink |
curtain | towel |
lamp | toilet |
At this time we focused only on detecting those specific objects for each picture type. Outcomes of evaluation should later be analyzed either by some simple algorithm or another model to match an image with one of the picture types we are looking for
We wanted to be as close as possible to real world scenarios so our dataset consists of real pictures from different hotels all over the world. Images where provided by Hotailors team
In our POC we used images scalled to max of 1000px on the wide side
Every picture usually consists of multiple types of objects we are looking for
We used total of 113 images to train and test our model from which we used:
82 images in positive
set for training the model. We have about 50/50 split between bathroom
and bedroom
pictures
Bathroom positive sample | Bedroom positive sample |
---|---|
11 images in negative
set for training the model. Those images should not contain any objects that we are interested in detecting
Negative sample 1 | Negative sample 2 |
---|---|
20 images in testImages
set for testing and evaluating the model. We have about 50/50 split between bathroom
and bedroom
pictures
Bathroom test sample | Bedroom test sample |
---|---|
After we tagged all of the images from HotailorPOC2
dataset we analyzed them to verify how many tagged objects per each class we have. It is suggested to use about 20-30% of all data in dataset as test data. Looking at our numbers below we did quite ok but there's still some room for improvement
object/class name | # of tagged objects in positive/train set | # of tagged objects in test set | % of tagged objects in relation to all objects |
---|---|---|---|
sink | 46 | 10 | 18 |
pillow | 98 | 27 | 22 |
toilet | 34 | 7 | 17 |
lamp | 69 | 18 | 21 |
curtain | 78 | 16 | 17 |
towel | 30 | 14 | 32 |
tap | 44 | 9 | 17 |
bed | 53 | 12 | 18 |
After training and evaluating our model we achieved following results:
Evaluating Faster R-CNN model for 20 images.
Number of rois before non-maximum suppression: 550
Number of rois after non-maximum suppression: 87
AP for sink = 0.4429
AP for pillow = 0.1358
AP for toilet = 0.8095
AP for lamp = 0.5404
AP for curtain = 0.7183
AP for towel = 0.0000
AP for tap = 0.1111
AP for bed = 0.8333
Mean AP = 0.4489
As you can see above, some of the results are not too good. For example: pillow
and tap
average precision for test set is extremely low and for towel
it even shows 0.0000 which may indicate some problems with our dataset or tagged objects. We will definitely need to look into it and check if we are able to somehow improve those results
Even though the Mean Average Precision values are not perfect we still were able to get some decent results:
Some of the results include mistakes. But those clearly look like anomalies which should be fairly easy to catch in further classification of picture type
Picture below shows how our model classified single region (yellow) as bed
object although it's clearly not there:
Another picture shows how our model classified single region as towel
object although it's clearly not there:
Of course sometimes there are some really ugly results which may be hard to use for further classification:
Next picture shows our model wasn't able to find any objects. We need to verify if it's because of wrongly tagged data in HotailorPOC2 or is it some kind of issue with Region Proposal Network and it simply didn't find any regions of interest for further classification
Final model will be used in form of web service running on Azure and that's why I prepared a sample RESTful web service written with Python using Flask module. This web service makes use of our trained model and provides API which takes images as an input for evaluation and returns either a cloud of tags or tagged images. Project also describes how to easily deploy this web service to Azure Web Apps with custom Python environment and required dependencies.
You can find running web service hosted on Azure Web Apps here, and project with code and deployement scripts can be found on GitHub.
Sample request and response in Postman:
Download content of this repo
You can either clone this repo or just download it and unzip to some folder
Setup Python environment
In order for scripts to work you should have a proper Python environment. If you don't already have it setup then you should follow one of the online tutorials. To setup Python environment and all the dependencies required by CNTK on my local Windows machine, I used scripted setup tutorial for Windows. If you're using Linux then you might want to look into one of these tutorials. Just bear in mind that this project was developed and tested with CNTK 2.1 and it wasn't tested for any other version.
Even after setting up Python environment properly you might still witness some errors when running Python scripts. Most of those errors are related to missing modules or some 3rd party frameworks and tools (i.e. GraphViz). Missing modules can be easily pip installed and most of the required ones can be found in requirements.txt
files for each folder with Python scripts.
Please report if you'll find any errors or missing modules, thanks!
Download hotel pictures dataset (HotailorPOC2) and pretrained AlexNet model used for Transfer Learning
Go to Detection/FasterRCNN folder in the location were you unzipped this repo and run install_data_and_model.py
. It will automatically download the HotailorPOC2
dataset, pretrained AlexNet model and will generate mapping files required to train the model.
After you go through setup steps you can start training your model.
In order to do it you need to run FasterRCNN.py
script located in Detection/FasterRCNN.
I'm working on Windows 10 so I run the script from Anaconda Command Prompt which should be installed during setup steps.
Bear in mind that training the model might take a lot of time depending on the type of machine you are using for training and if you're using GPU or CPU.
python FasterRCNN.py
TIP: If you don't own any machine with heavy GPU you can use one of the ready to go Data Science Virtual Machine images in Azure.
When the training and evaluation will be completed, you should see something similar to this:
Evaluating Faster R-CNN model for 20 images.
Number of rois before non-maximum suppression: 550
Number of rois after non-maximum suppression: 87
AP for sink = 0.4429
AP for pillow = 0.1358
AP for toilet = 0.8095
AP for lamp = 0.5404
AP for curtain = 0.7183
AP for towel = 0.0000
AP for tap = 0.1111
AP for bed = 0.8333
Mean AP = 0.4489
Trained model, neural network topology and evaluated images (with plotted results) can later be found in Output
folder located in Detection/FasterRCNN
.
config.py - most of variables are set in this file
These variables are responsible for chosing a dataset that will be used to train the model. Most important variables here are :
__C.CNTK.DATASET = "HotailorPOC2"
[..]
if __C.CNTK.DATASET == "HotailorPOC2": #name of your dataset Must match the name set with property '__C.CNTK.DATASET'
__C.CNTK.MAP_FILE_PATH = "../../DataSets/HotailorPOC2" # dataset directory
__C.CNTK.NUM_TRAIN_IMAGES = 82 # number of images in 'positive' folder
__C.CNTK.NUM_TEST_IMAGES = 20 # number of images in 'testImages' folder
__C.CNTK.PROPOSAL_LAYER_PARAMS = "'feat_stride': 16\n'scales':\n - 4 \n - 8 \n - 12"
IMAGE_WIDTH
and IMAGE_HEIGHT
are used to determine the input size of images used for training and later on for evaluation:
__C.CNTK.IMAGE_WIDTH = 1000
__C.CNTK.IMAGE_HEIGHT = 1000
BASE_MODEL
defines which pretrained model should be used for transfer learning. Currently we used only AlexNet. In future we want to test it with VGG16 to check if we can get better results then with AlexNet
__C.CNTK.BASE_MODEL = "AlexNet" # "VGG16" or "AlexNet" or "VGG19"
It holds all the dependencies required by my scripts and CNTK libraries to work. It can be used with pip install
command to quickly install all the required dependencies (more here)
matplotlib==1.5.3
numpy==1.13.3
cntk==2.1
easydict==1.6
Pillow==4.3.0
utils==0.9.0
PyYAML==3.12
This script does 3 things:
Downloads pretrained model specified in config.py which will be later used for transfer learning:
#downloads pretrained model pointed out in config.py that will be used for transfer learning
sys.path.append(os.path.join(base_folder, "..", "..", "PretrainedModels"))
from models_util import download_model_by_name
download_model_by_name(cfg["CNTK"].BASE_MODEL)
Downloads and unzips our sample HotailorPOC2 dataset:
#downloads hotel pictures classificator dataset (HotailorPOC2)
#comment out lines bellow if you're using a custom dataset
sys.path.append(os.path.join(base_folder, "..", "..", "DataSets", "HotailorPOC2"))
from download_HotailorPOC2_dataset import download_dataset
download_dataset()
Creates mappings and metadata for dataset:
#generates metadata for dataset required by FasterRCNN.py script
print("Creating mapping files for data set..")
create_mappings(base_folder)
Although this project was prepared specifically for Hotailors case, it's based on one of the standard examples from original CNTK repository on GitHub and thus it can be easily reused in any other scenario. You just need to follow steps bellow:
Follow steps number 1 and 2 from setup instructions.
Think what type of objects you would like to classify and prepare some images with those objects. The more the better but usually u should get some decent results even with 30-40+ samples per object. Remember that single image can have multiple objects (it was exactly like that in our case)
Make sure to use only good quality images in specific resolution
Resolution we used for our project was 1000x1000 px but you can easily lower it depending on your scenario and needs. Just make sure to scale your images to this one specific resolution you will be working with. In our case the original images where much larger then 1000x1000 px but we scalled it down to match the longer side of image to 1000 px
It's not recommended to go beyond 1000x1000 px
Create a new folder in Datasets
directory and name it with whatever your datasets name is and inside that newly created folder create 3 another folders for your images:
negative
Here you must add images which don't include any of the objects you will be looking for. The more the better but don't get crazy here, 10 to 20 images should more then enought. Those images will be used during training to show our model what is not interesting for us and should be treated as a background
positive
Here you must add images that will be used to teach our model what kind of objects it should look for. The more the better but we should be able to see some results with 30-40+ images per class/object we would like to detect. Just bear in mind that one image can have more then one object/class.
testImages
Those images will be used for testing of your trained model and to evaluate AP (Average Precission) percentage for each class. Just take 20-30 percent of images from positive
folder and put them here. It's very important though to not duplicate any images between positive
and testImages
folders as it may corrupt the results
In order to make your custom dataset ready to be used for training you will need to create some metadata with coordinates of objects and their names (classes)
Currently the best tool for tagging images is Visual object Taging Tool but for this project I used simple Python scripts that can be found in the original CNTK 2.1 github repository (mine were fine tuned a bit):
C1_DrawBboxesOnImages.py - allows you to draw bounding boxes for all the objects which are interesting to you (present objects you wish to recognize).
There is one variable you will need to change before running this script:
#change it to your images directory. Run this script separately for each folder
imgDir = "../../DataSets/HotailorPOC2/testImages"
Important thing to mention here is to run this script only for positive
and testImages
. You don't need to do it for negative
because there's actually nothing to tag there.
After successfully running the script you should see something like that:
Now just use your mouse to draw bounding boxes for every object. Some keyboard shortcuts should be helpful here:
"u" - will erase last bounding box you draw
"n" - will move you to next image in current folder
"s" - will skip current image and delete all the bounding boxes for that image
C2_AssignLabelsToBboxes.py - allows to review every bounding box you've marked with C1 script and label it with proper class name.
Before running this script change those 2 variables:
#change it to your images directory. Run this script separately for each folder
imgDir = "../../DataSets/HotailorPOC2/testImages"
#change it to your classes names
classes = ["curtain", "pillow", "bed", "lamp", "toilet", "sink", "tap", "towel"]
Again, same as in C1, run this script only for positive
and testImages
.
C3_VisualizeBboxes.py - I made this script based on C2 just to visualize bounding boxes for each image in dataset. It's very helpful when you are looking for mistakes within your dataset.
Be sure to change imgDir
variable to your directory:
#change it to your images directory. Run this script separately for each folder
imgDir = "../../DataSets/HotailorPOC2/testImages"
Running C3 script will visualize bounding boxes for every image in directory and you should be able to see if everything is marked correctly:
In order to train the model we use transfer learning and we need to have a pretrained model for that. For this sample we use AlexNet model.
To download the model and create class and files mappings you can use install_data_and_model.py script and simply follow these steps:
Make sure to change variables in your config.py file and make sure you set __C.CNTK.MAP_FILE_PATH
variable to a proper directory:
if __C.CNTK.DATASET == "HotailorPOC2": #name of your dataset. Must match the name set with property '__C.CNTK.DATASET'
__C.CNTK.MAP_FILE_PATH = "../../DataSets/HotailorPOC2" # your dataset directory
__C.CNTK.NUM_TRAIN_IMAGES = 82 # number of images in 'positive' folder
__C.CNTK.NUM_TEST_IMAGES = 20 # number of images in 'testImages' folder
__C.CNTK.PROPOSAL_LAYER_PARAMS = "'feat_stride': 16\n'scales':\n - 4 \n - 8 \n - 12"
Open install_data_and_model.py script and comment out those lines:
#downloads hotel pictures classificator dataset (HotailorPOC2)
#comment out lines bellow if you're using a custom dataset
sys.path.append(os.path.join(base_folder, "..", "..", "DataSets", "HotailorPOC2"))
from download_HotailorPOC2_dataset import download_dataset
download_dataset()
Run install_data_and_model.py script. Bear in mind that downloading the pretrained model may take few minutes or even more depending on your internet connection.
At this point your custom dataset should be ready for training.
Edit config.py script and change following variables:
Change value of __C.CNTK.DATASET
:
# set it to your custom dataset name
__C.CNTK.DATASET = "HotailorPOC2"
Change values of __C.CNTK.IMAGE_WIDTH
and __C.CNTK.IMAGE_HEIGHT
to much your custom dataset images resolution:
# set it to your custom datasets images resolution
__C.CNTK.IMAGE_WIDTH = 1000
__C.CNTK.IMAGE_HEIGHT = 1000
Change values in following code to match your dataset name, your datasets directory location and to match your custom dataset images resolution:
if __C.CNTK.DATASET == "HotailorPOC2": #name of your dataset. Must match the name set with property '__C.CNTK.DATASET'
__C.CNTK.MAP_FILE_PATH = "../../DataSets/HotailorPOC2" # your dataset directory
__C.CNTK.NUM_TRAIN_IMAGES = 82 # number of images in 'positive' folder
__C.CNTK.NUM_TEST_IMAGES = 20 # number of images in 'testImages' folder
Train and test your model with FasterRCNN.py script
Run FasterRCNN.py
script and wait till the training and testing finishes.
Training may take even couple hours depending on your hardware setup. It's is best to use high performing GPU's for that kind of purposes.
TIP: If you don't own any machine with heavy GPU you can use one of the ready to go Data Science Virtual Machine images in Azure.
If you won't be satisfied with training results then try fine tunning the variables and cleaning your dataset if necessary and then rerun the training.
When you will find yourself satisfied with your model and you would like to get to know how to use it with RESTful Python web service and deploy it to Azure Web Apps, then check out this repository.