Visual motion perception from a moving observer is the most often encountered case in real life situations. It is a complex and challenging problem, although, it can promote the arising of new applications.
1) Implement two traditional optical flow techniques with multichannel and multiresolution with refinement approaches, namely: (Local) Lucas-Kanade and (Global) Horn-Schunck.
2) Consider the following metrics to assess the quality of your implementation:
Average angular error (AAE) and standard deviation
Average end-point error (EPE) and standard deviation
3) Discuss the results, taking into consideration the following paper:
Andry et al. (2013), Revisiting Lucas-Kanade and Horn-Schunck, Journal of Computer Engineering and Informatics, Apr. 2013, Vol. 1 Iss. 2, PP. 23-29.
Results must be provided as a table.
4) Consider the image sequences for this project and estimate the optical flow using these two techniques. Produce two videos per image sequence showing the magnitude and orientation of the flow using the color scheme presented in the lectures. Discuss the results obtained.
Image sequences for this project:
Forest_15_3b_Videvo
Forest_15_4_Videvo
Project 2 – Denoising a video sequence.
The presence of noise in videos affects subsequent image processing phases, such as three dimensional reconstruction, registration, classification of objects, motion segmentation and
analysis, tracking, identification and recognition of humans. Thus, denoising is an extremely
important pre-processing phase that is used to improve the perceptual appearance of images;
however, a trade-off between noise reduction and data preservation is important to enhance
the characteristics of images that are relevant for high level algorithms
1) Implement the robust bilateral and temporal filter (RBLT) for denoising a video sequence.
Spatial and temporal components are incorporated into the filter formulation, which increases
the filter's ability to remove strong noise components. Consider the Geman-Mcclure or the
Charbonnier as error norms for M-Estimators.
2) Consider the following evaluation metrics to assess the quality of your implementation:
- SIIM
- PNSR
The original image (distortion-free or reference), must be compared to the distorted image,
using these two evaluation metrics. The distorted image is obtained by corrupting the original
image with a distinct noise configuration (Salt-Pepper and Gaussian Noise) and then, the image
sample is filtered by each filter, individually. The level of noise that should be added to each
original image is 20 to 40 of standard deviation for Gaussian noise and 10 to 30% for the SaltPepper noise. Results must be provided graphically.
3) Discuss the results by taking into consideration the median and Gaussian filter. You can also
consider the following paper: Andry et al. (2013), Enhancing dynamic videos for surveillance and
robotic applications: The robust bilateral and temporal filter, Signal Processing: Image
Communication, Elsevier, 2014.
4) Consider the image sequences for this project and estimate the optical flow using these two
techniques. Produce two videos per image sequence showing the magnitude and orientation of
the flow using the color scheme presented in the lectures. Discuss the results obtained.
Image sequences for this project:
- mlky_6
- 210329_06A_Bali_4k_004
- Saint_Barthelemy_2
Project 3 – Captcha decoding.
A CAPTCHA (Completely Automated Public Touring test to Tell Computers and Humans Apart) is
a commonly used feature in web applications to block non-human access. CAPTCHAs' purpose
is to prevent spam on websites, such as promotion spam, registration spam, and data scraping,
and bots are less likely to abuse websites with spamming if those websites use CAPTCHA. Many
websites use CAPTCHA to prevent bot raiding, and it works effectively. CAPTCHA's design is that
humans can complete CAPTCHAs, while most robots can't.
1) This project aims to develop a CNN with ability to decode CAPTCHA images considering
4 and 5 encoders. The model of the CNN needs to be designed, implemented and trained
(no fine tuning approaches should be applied);
2) Consider the following metrics:
a. Train and test accuracy;
b. Confusion matrix;
c. Others evaluation methodologies (e.g., confusion matrix, histograms).
3) Discuss the result of your approach, in particular, limitations;
4) Consider the CAPTCHA dataset provided which has 4 to 5 digits.
a. Soft dataset is formed by CAPTCHAs that are more simple. Students must start
the project with this dataset.
b. Hard dataset is formed by CAPTCHAs with strange elements added, to make the
identification more difficult to predict.
Project 4 – Open Project
Students can develop a project in CV that is related to their MSc Thesis. Therefore, the teams
should send a project proposal until the 14th of April, 2023, containing the following topics:
- Motivation
- Objectives
- Problem statement (eg, classification, regression, etc)
- Dataset
Project 1 – Estimation of the apparent motion
Visual motion perception from a moving observer is the most often encountered case in real life situations. It is a complex and challenging problem, although, it can promote the arising of new applications.
1) Implement two traditional optical flow techniques with multichannel and multiresolution with refinement approaches, namely: (Local) Lucas-Kanade and (Global) Horn-Schunck.
2) Consider the following metrics to assess the quality of your implementation:
3) Discuss the results, taking into consideration the following paper:
Results must be provided as a table.
4) Consider the image sequences for this project and estimate the optical flow using these two techniques. Produce two videos per image sequence showing the magnitude and orientation of the flow using the color scheme presented in the lectures. Discuss the results obtained.
Image sequences for this project:
Project 2 – Denoising a video sequence.
The presence of noise in videos affects subsequent image processing phases, such as three dimensional reconstruction, registration, classification of objects, motion segmentation and analysis, tracking, identification and recognition of humans. Thus, denoising is an extremely important pre-processing phase that is used to improve the perceptual appearance of images; however, a trade-off between noise reduction and data preservation is important to enhance the characteristics of images that are relevant for high level algorithms 1) Implement the robust bilateral and temporal filter (RBLT) for denoising a video sequence. Spatial and temporal components are incorporated into the filter formulation, which increases the filter's ability to remove strong noise components. Consider the Geman-Mcclure or the Charbonnier as error norms for M-Estimators. 2) Consider the following evaluation metrics to assess the quality of your implementation: - SIIM - PNSR The original image (distortion-free or reference), must be compared to the distorted image, using these two evaluation metrics. The distorted image is obtained by corrupting the original image with a distinct noise configuration (Salt-Pepper and Gaussian Noise) and then, the image sample is filtered by each filter, individually. The level of noise that should be added to each original image is 20 to 40 of standard deviation for Gaussian noise and 10 to 30% for the SaltPepper noise. Results must be provided graphically. 3) Discuss the results by taking into consideration the median and Gaussian filter. You can also consider the following paper: Andry et al. (2013), Enhancing dynamic videos for surveillance and robotic applications: The robust bilateral and temporal filter, Signal Processing: Image Communication, Elsevier, 2014. 4) Consider the image sequences for this project and estimate the optical flow using these two techniques. Produce two videos per image sequence showing the magnitude and orientation of the flow using the color scheme presented in the lectures. Discuss the results obtained. Image sequences for this project: - mlky_6 - 210329_06A_Bali_4k_004 - Saint_Barthelemy_2Project 3 – Captcha decoding.
A CAPTCHA (Completely Automated Public Touring test to Tell Computers and Humans Apart) is a commonly used feature in web applications to block non-human access. CAPTCHAs' purpose is to prevent spam on websites, such as promotion spam, registration spam, and data scraping, and bots are less likely to abuse websites with spamming if those websites use CAPTCHA. Many websites use CAPTCHA to prevent bot raiding, and it works effectively. CAPTCHA's design is that humans can complete CAPTCHAs, while most robots can't. 1) This project aims to develop a CNN with ability to decode CAPTCHA images considering 4 and 5 encoders. The model of the CNN needs to be designed, implemented and trained (no fine tuning approaches should be applied); 2) Consider the following metrics: a. Train and test accuracy; b. Confusion matrix; c. Others evaluation methodologies (e.g., confusion matrix, histograms). 3) Discuss the result of your approach, in particular, limitations; 4) Consider the CAPTCHA dataset provided which has 4 to 5 digits. a. Soft dataset is formed by CAPTCHAs that are more simple. Students must start the project with this dataset. b. Hard dataset is formed by CAPTCHAs with strange elements added, to make the identification more difficult to predict.Project 4 – Open Project
Students can develop a project in CV that is related to their MSc Thesis. Therefore, the teams should send a project proposal until the 14th of April, 2023, containing the following topics: - Motivation - Objectives - Problem statement (eg, classification, regression, etc) - Dataset