Relationship Visualization

Identify a visual relationship in a given image
This is based on Kaggle Visual Relationship Track

Foreward

This project is addendum to a larger work in liaison with others. However, the published code is entirely mine, and nothing shared in this repository breaches the sanctity of research. Any proposal discussed is also common public domain knowledge, and the actual model implementing it has been withheld.

Further, no model weights have been published so as to ensure no harm comes to this research.

Once research completes, with cognizance of the team, will publish the models/ weights as well, because Deep Learning Community growth happens exponentially when there is sharing of published research in public domain.

Dataset

The training dataset was derived from Open Image Dataset v5 and contains 329 relationship triplets with 375k training samples. These include both human-object relationships (e.g. "woman playing guitar", "man holding microphone"), object-object relationships (e.g. "beer on table", "dog inside car"), and also considers object-attribute relationships (e.g."handbag is made of leather" and "bench is wooden").

The features of this dataset are as follows -

~3.8 lac training images samples
57 unique classes (labels)
10 relationships (such as 'is', 'on', 'under', 'at', etc.)
5 label attributes (such as 'transparent', 'made of leather', 'made of plastic', 'wooden', etc.)
329 valid relationship triplets

Following types of relationships can be inferred from any image - Relationship Samples

Approach

Given the nature of training data, each relationship can be decomposed into following types

Subject Label (L_s) -> Relation (R_o) -> Object (L_o) : Bag Pack at Table
Label (L_o) -> is -> Attribute (A_l) : Table is Wooden

Hence when compounding a description, following structure is achieved

((L_s) (A_l1)) -> (R_o) -> ((L_o) (A_l2))
e.g. Bag Pack made of Fabric at Table which is Transparent

or a simplified version

((A_l1) (L_s)) -> (R_o) -> ((A_l2) (L_o))
e.g. Transparent Bottle on Wooden Table

Proposal

Three separate models have been proposed.

Input is given to Object Detection Model. Output is a list of Labels, bounding boxes and confidence.
This is sent to an Attribute prediction model. Output is list of 'n' attributes for 'n' labels
Same input is sent to Relationship Prediction model. Output is list of 'n (n-1) 2' relations predicted

Model 1 : Object Detection

Out of the box object detection model (YOLOv3) is used, which has been retrained using transfer learning for 57 labels.

Model 2 : Attribute Prediction

Features generated from labels, bounding boxes, Label ROIs are used to predict attribute Attribute Model

Model 3 : Relationship Triplet Prediction

Features generated from 2 labels pair embeddings and bounding boxes are used to predict relation triplets. Attribute Model

Installing / Getting started

This is a python project, and should run fine on version >= 3.

Install python 3.x

Create a virtual environment for python

pip3 install virtualenv
mkdir ~/.virtualenvs

pip3 install virtualenvwrapper

# Add following to bash_profile
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python3
export VIRTUALENVWRAPPER_VIRTUALENV=/usr/local/bin/virtualenv
source ~/.bash_profile

source /usr/local/bin/virtualenvwrapper.sh

workon
mkvirtualenv visual_relations

This setups up a new virtualenv called visual_relations.

Install the required libraries for this project
```
pip3 install -r requirements.txt
```
Install MongoDB and configure it in conf/config.yaml

Initial Configuration

Setup mongoDB correct URL in config.yaml/ or provide environment variables in .env for the url

Developing

In order to work on this further, use following -

git clone git@github.com:usriva2405/visual-relationship-detection-api.git
cd visual-relationship-detection-api/

Running Code Directly (Non Docker)

There are 3 ways to run this directly (locally)

Use python to run controller directly
```
python app/controller/flask_controller.py
curl http://127.0.0.1:5002      # prints Welcome to Visual Relationship Prediction!
```
If the project has been setup, this prints Welcome to Visual Relationship Prediction! on console

Using WSGI Server for running app (without config)

You can also use following for running the app :

gunicorn -b localhost:5002 -w 1 app.controller.flask_controller:app
curl http://127.0.0.1:5002      # prints Welcome to Visual Relationship Prediction!

App would be accessible on http://127.0.0.1:8880

Using WSGI Server for running app (with config)

Use following for running the app :

gunicorn -c conf/gunicorn.conf.py --log-level=debug app.controller.flask_controller:app
gunicorn -c conf/heroku-gunicorn.conf.py --log-level=debug app.controller.flask_controller:app
curl http://127.0.0.1:5002      # prints Welcome to Visual Relationship Prediction!

App would be accessible on http://0.0.0.0:5002

Deploying / Publishing

Docker

For building the project run

docker build --no-cache -t visual-relationship:latest .

For deploying the project run

DEV
docker run -d -p 5002:5002 --name visual-relationship -e ENVIRONMENT_VAR=DEV visual-relationship:latest

hit localhost:5002 on browser to access the project

Configuration

Optional : Have mongoDB running and accessible on the URL given in config.yaml

Sample Request-Response

Image POST as form-data

We can pass images as form-data (local folder uploads) for verification

URL localhost:5002/detectobjects
TYPE POST (form_data)
HEADER Content-Type : multipart/form-data
SAMPLE request key-value pairs

base_image : <<multipart form based image>>

Image POST as json

We can also pass images as URLs (s3-bucket URLs) for verification

URL localhost:5002/detectobjectsjson
TYPE POST
HEADER Content-Type : "application/json"
SAMPLE request json

{
    base_image : <<image_url>>
}

Heroku deployment