Description
Existing re-identification (ReID) models rely on specific cameras which presented in the training set. This means that when we will try to use those models for new cameras, which will have different angle-to-the-ground or color balance, re-identification accuracy will decrease drasticaly. Therefore, scalability of such models is poor, because it is not able to adapt to new circumstances. For this reason camera-invariant representation, which means representation invariant to camera angle and color perturbations, is required to succeed at ReID.
Background
There were many attempts to force neural network to learn features invariant to camera angle, handwriting style and object pose. Those attempts resulted with classification networks, generative models with disentangled representation, capsule networks and others. Unfortunately these models still need to be specialized for certain data which means that we still need to adapt the model to new data. For car re-identification service it means that we need a model that will be able to work on distributed network of cameras and will allow us to (re-)identify cars from distinct camera angles and light conditions.
Refrence Papers
Vehicle Re-Identification: an Efficient Baseline Using Triplet Embedding (https://arxiv.org/abs/1901.01015). This paper presents an aproach to train straightforward model that re-identifyes cars. The model is based on famous triplet loss and uses only identity-level annotations. This is a baseline model and it produces smallest embedding for fast inference. Recently there were a bunch of person re-identification models with similar approach.
Group-Sensitive Triplet Embedding for Vehicle Reidentification (https://ieeexplore.ieee.org/document/8265213). This paper studies an intraclass variance in embeddings of vehicle images. Also group-sensetive-triplet loss is proposed. The authors claim their model to outperform current state-of-the-art approach in vehicle-reid.
Survey on Deep Learning Techniques for Person Re-IdentificationTask (https://arxiv.org/pdf/1807.05284.pdf) The paper summarizes progress of last decade in the field of person re-identification especially in deep models for this goal. There are results of evaluation also brief description of each model is given.
Fast vehicle identification via ranked semantic sampling based embedding (https://www.ijcai.org/proceedings/2018/0514.pdf). The authors suggest to treat re-identification as retrieval problem. Also they present sampling method to binarize embedding of the network and ranking loss to put relevant images closer to query image.
Training Datasets
Vehicle-1M (http://www.nlpr.ia.ac.cn/iva/homepage/jqwang/Vehicle1M.htm). The Vehicle-1M dataset is constructed by National Laboratory of Pattern Recognition, Institute of Automation, University of Chinese Academy of Sciences (NLPR, CASIA). This dataset involves vehicle images captured across day and night, from head or rear, by multiple surveillance cameras installed in several cities in China.
VeRi-776 (https://github.com/VehicleReId/VeRidataset). A large-scale benchmark dateset for vehicle Re-Id in the real-world urban surveillance scenario. It contains over 50,000 images of 776 vehicles captured by 20 cameras covering an 1.0 km^2 area in 24 hours, which makes the dataset scalable enough for vehicle Re-Id and other related research.
PKU-VehicleID (https://pkuml.org/resources/pku-vehicleid.html). The dataset contains data captured during daytime by multiple real-world surveillance cameras distributed in a small city in China. There are 26267 vehicles (221763 images in total) in the entire dataset. Each image is attached with an id label corresponding to its identity in real world. In addition, we manually labeled 10319 vehicles(90196 images in total) of their vehicle model information(i.e.“MINI-cooper”, “Audi A6L” and “BWM 1 Series”).
Acceptance Criteria
The results should include a model that allows to identify a certain car from one camera to another, assuming that each camera detects several cars. The format should define interface which allows to select car-id from one camera and obtain <camera-id, car-id-bounding-box> where the car appears again.
METRICS:
Classification accuracy (for training) which is the fraction of number of correct predictions and number of all predictions.
ReID accuracy (rank-5, rank-1). In this case we are using latent code (z) of our network. Obtain z for all examples (dataset of vehicle images) except the one you want to match (query) then obtain z for query and calculate the distance metric (for examples Euclidean distance) between dataset and query codes. Finally, find closest code from dataset to query code, compare match class and query class — this is similar to classification accuracy (rank-1). Then you can find 5 closest codes and if there will be match — this is rank-5.
mAP Obtain z for all examples (dataset of vehicle images) except the one you want to match (query) then obtain z for query and calculate the distance metric (for examples Euclidean distance) between dataset and query codes. Calculate
Different metrics show different aspects of the model. While accuracies (1, 2) show that we just have a match for our query, a good mAP shows that our model place relevant images closer to each other (this is useful for image retrieval and ReID as image retrieval problem).
MINIMUM TARGET METRIC
Achieve at least state-of-the-art at VeRi dataset (mAP – 67.55 [ref. papers 1], acc. Rank 5 – 98.97 [ref. papers 2], acc. Rank 1 – 96.24 [ref. papers 2]). Reduce the gap between mAP and accuracy. Accuracy is simpler than mAP, moreover accuracy is optimized during training. So if mAP is getting closer to accuracy that is success and such model will be able to work better than model with lower mAP in real world environment.
Non-functional Requirements
This should be also evaluated by experts in real world environment (office/pedestrians/campus).
Open source software mandatory.
Author Artem Yashenko
Description Existing re-identification (ReID) models rely on specific cameras which presented in the training set. This means that when we will try to use those models for new cameras, which will have different angle-to-the-ground or color balance, re-identification accuracy will decrease drasticaly. Therefore, scalability of such models is poor, because it is not able to adapt to new circumstances. For this reason camera-invariant representation, which means representation invariant to camera angle and color perturbations, is required to succeed at ReID.
Background There were many attempts to force neural network to learn features invariant to camera angle, handwriting style and object pose. Those attempts resulted with classification networks, generative models with disentangled representation, capsule networks and others. Unfortunately these models still need to be specialized for certain data which means that we still need to adapt the model to new data. For car re-identification service it means that we need a model that will be able to work on distributed network of cameras and will allow us to (re-)identify cars from distinct camera angles and light conditions.
Refrence Papers
Training Datasets
Acceptance Criteria
The results should include a model that allows to identify a certain car from one camera to another, assuming that each camera detects several cars. The format should define interface which allows to select car-id from one camera and obtain <camera-id, car-id-bounding-box> where the car appears again.
METRICS:
MINIMUM TARGET METRIC
Non-functional Requirements This should be also evaluated by experts in real world environment (office/pedestrians/campus). Open source software mandatory.
Reward Amount 1000 tokens
Expiration Date 10 April 2020