tradle / KYCDeepFace

KYC face matching project.
GNU Affero General Public License v3.0
4 stars 0 forks source link

Precompute GLINT model #5

Open martinheidegger opened 2 years ago

martinheidegger commented 2 years ago

Currently the GLINT360k model is being computed on the fly.

It turns out that big part (40%?) of the training process on the smallest AWS machine is the resizing/preparing of the input training data. This can be done by any machine really and can be applied in advance to the training data using spare computing resources. No need to run this on the expensive machine and expect this to be part of the required training process.

Details

While loading the images during training...

https://github.com/tradle/KYCDeepFace/blob/233f33d62daa6dd4eeae9433cab58822f8208a0d/train.py#L146

... the GLINT360k loader processes each image:

https://github.com/tradle/KYCDeepFace/blob/233f33d62daa6dd4eeae9433cab58822f8208a0d/dataloader/GLINT_loader.py#L74-L84

This is a relatively slow process which does not require a strong CPU and can be easily done in advance. We should have the pre-process step isolated and store the images separately. As each image would become smaller, the total data size of GLINT360k would also be reduced significantly as well as probably some reduction of the runtime memory.

This should be the standard application for every kind of input-data.