Open salamanders opened 5 years ago
I think your approach is reasonable. If you have very small faces (relative to the image size) then SSD Mobilenetv1 will likely perform better than the TinyFaceDetector, but it won't necessarily detect all faces if you simply pass in your HD image without tiling it.
Any rough upper bounds (image size) to lower bounds (pixels of face size) that are likely to need tiling?
Not sure actually, it depends on the size of the faces appearing in an image relatively to the image size, since the images will be scaled down to the network input size.
You would have to try out what works for you.
All the "take an existing detection and adjust it" are in terms of scaling, there doesn't seem to be an offset function. Would you suggest
Sorry, I didn't get the question. Do you mean shifting a Rect? You could simply create a new one new faceapi.Rect(oldRect.x + offsetX, oldRect.y + offsetY, oldRect.width, oldRect.height)
.
Or create a PR / issue at tfjs-image-recognition-base to implement a simple shift method in the Box class.
Clarification of my question: Given that I have an const fullFaceDescriptions:Array<FullFaceDescription>
, and need to shift every det inside fullFaceDescriptions by some (x,y) amount - is there an easy way to do it?
There is the example's const detectionsForSize = detections.map(det => det.forSize(input.width, input.height))
that scales all the detections, I'm assuming a shift would be similar.
Where I get a bit tripped up is what I need to shift, vs. what is relative, across
Because landmarks code mentions "unshiftedLandmarks", and "IFaceLandmarks" has "shift: Point" and "shiftBy" - which makes me think some parts of it are relative, or shift-able, and I'd be reinventing if I stumbled in blindly and tried to make a new det.forShift()
Ahh I see. So the the face landmark positions are relative to the bounding box of the rectangle, since you predict them on the image patch extracted from the bounding box. Thus I introduced a shift for the landmark classes for the ease of drawing them, because they have to be shifted by the bounding box position in order to retreive the positions in the source image.
So actually what you would have to do, is to calculate the correct bounding box for each fd.detection, e.g. shift them by the (x, y) offset of the image patch they have been detected for. Afterwards you would simply have to shift all landmarks by the shift of their bounding box (notice shiftBy(x, y) should actually be named "setShift", since it sets an offset for the relative landmarks, rather then adding an offset for every call).
Unfortunately currently the utility is missing to easily shift instances of FaceDetection or the utility classes, so you would have to reconstruct these objects manually.
Understood. I may hack my local script and toss in a "tileLocation=(x,y)" for every detection, because I'd rather spend the time figuring out how to deduplicate across tiles rather than how to correctly shift detections.
for (let imgNum = 0; imgNum < IMAGES.length; imgNum++) {
const imgName = IMAGES[imgNum];
console.groupCollapsed(`Image:${imgName}`);
asyncSrc(input, `images/${imgName}`);
const detections = [];
for (let x = 0; x < input.width - (TILE_DIMENSION / 2); x += (TILE_DIMENSION / 2)) {
for (let y = 0; y < input.height - (TILE_DIMENSION / 2); y += (TILE_DIMENSION / 2)) {
const tileLocation = {'x': x, 'y': y};
drawTile(input, smallCanvas, tileLocation.x, tileLocation.y, TILE_DIMENSION, TILE_DIMENSION);
let fullFaceDescriptions = await faceapi.detectAllFaces(
smallCanvas,
new faceapi.SsdMobilenetv1Options({maxResults: 100})
).withFaceLandmarks().withFaceDescriptors();
console.log(`${imgName} (${tileLocation.x},${tileLocation.y}) detected faces:${fullFaceDescriptions.length}`);
fullFaceDescriptions.forEach(ffd => ffd.tileLocation = tileLocation); // *** THIS PART
detections.push(...fullFaceDescriptions);
}
}
console.log(`${imgName} saving ${detections.length} detections.`);
await set(input.src.substring(input.src.lastIndexOf('/') + 1), detections);
console.groupEnd();
}
I'm now knee-deep in deduping across tiles. In the case where two faces overlap, what is the way to tell which one is better?
I'm assuming:
score
. I was also thinking I could base it on other things like which has the highest max(all possible expressions) - but AFAIK score is my best bet for "we really think this is the best face box."
I have a HD image (4k by 3k) and would like to get all faces in it. The faces are very small in the distance, so I assume I have to tile the image into overlapping tiles of reasonable (500x500?) size, run the detector on each tile, handle overlaps, recombine into one superset.
Has anyone else already done this? Or, are there better ways to handle larger images?