Closed jalil-salame closed 1 year ago
Hello there! Previously i successfully parallelize predictions using this crate (i didn't use rayon, but tokio, but i believe that wouldn't be a problem), but i believe we tried different approaches. Do you have any code snippet to show us so we can figure out if anything went wrong? I'm unsure if you're trying to use rayon to speed up a process inside this crate, or a high-level understanding of a prediction.
Thanks for bringing this up :D
I am running the face detector and landmark predictor inside of rayon.
This is the code (without rayon), it breaks if you do into_par_iter
instead of into_iter
This is the code that calls into dlib:
Using gdb I find the crash on the memory allocator from dlib (C++):
#0 0x00007ffff7b6d009 in __memset_avx2_unaligned_erms () from /nix/store/ayg065nw0xi1zsyi8glfh5pn4sfqd8xg-glibc-2.37-8/lib/libc.so.6
#1 0x0000555555fd89d1 in void dlib::impl_fhog::init_hog<float, dlib::memory_manager_stateless_kernel_1<char>, dlib::memory_manager_stateless_kernel_1<char> >(dlib::array<dlib::array2d<float, dlib::memory_manager_stateless_kernel_1<char> >, dlib::memory_manager_stateless_kernel_1<char> >&, int, int, int, int) ()
#2 0x0000555555fe2189 in void dlib::impl_fhog::impl_extract_fhog_features<dlib::matrix<dlib::rgb_pixel, 0l, 0l, dlib::memory_manager_stateless_kernel_1<char>, dlib::row_major_layout>, dlib::array<dlib::array2d<float, dlib::memory_manager_stateless_kernel_1<char> >, dlib::memory_manager_stateless_kernel_1<char> > >(dlib::matrix<dlib::rgb_pixel, 0l, 0l, dlib::memory_manager_stateless_kernel_1<char>, dlib::row_major_layout> const&, dlib::array<dlib::array2d<float, dlib::memory_manager_stateless_kernel_1<char> >, dlib::memory_manager_stateless_kernel_1<char> >&, int, int, int) ()
#3 0x0000555555ff0de9 in void dlib::object_detector<dlib::scan_fhog_pyramid<dlib::pyramid_down<6u>, dlib::default_fhog_feature_extractor> >::operator()<dlib::matrix<dlib::rgb_pixel, 0l, 0l, dlib::memory_manager_stateless_kernel_1<char>, dlib::row_major_layout> >(dlib::matrix<dlib::rgb_pixel, 0l, 0l, dlib::memory_manager_stateless_kernel_1<char>, dlib::row_major_layout> const&, std::vector<dlib::rect_detection, std::allocator<dlib::rect_detection> >&, double) [clone .constprop.0] ()
#4 0x0000555555ff3dfc in __cpp_closure_9405994709588434887 ()
#5 0x0000555555fc9277 in <dlib_face_recognition::face_detection::hog::FaceDetector as dlib_face_recognition::face_detection::base::FaceDetectorTrait>::face_locations::h13a9c7e3fb1a1f33 ()
#6 0x00005555556d6a50 in landmark_extractor::extract_landmarks::h19c74ed3caa2461f ()
#7 0x00005555556c9016 in <rayon::iter::map::MapFolder<C,F> as rayon::iter::plumbing::Folder<T>>::consume::h29a4ebbc12e017fa ()
#8 0x000055555568f708 in rayon::iter::plumbing::Folder::consume_iter::h22c3cde534b77f5d ()
#9 0x00005555556c01d7 in rayon::iter::plumbing::bridge_producer_consumer::helper::h85b815c6acc52d9c ()
#10 0x00005555556cabaf in rayon_core::join::join_context::{{closure}}::hd8c172b9dc1ce682 ()
#11 0x00005555556cb0ea in rayon_core::registry::in_worker::heda56fb9e44646f1 ()
#12 0x00005555556c0340 in rayon::iter::plumbing::bridge_producer_consumer::helper::h85b815c6acc52d9c ()
#13 0x00005555556cabaf in rayon_core::join::join_context::{{closure}}::hd8c172b9dc1ce682 ()
#14 0x00005555556cb0ea in rayon_core::registry::in_worker::heda56fb9e44646f1 ()
#15 0x00005555556c0340 in rayon::iter::plumbing::bridge_producer_consumer::helper::h85b815c6acc52d9c ()
#16 0x00005555556cabaf in rayon_core::join::join_context::{{closure}}::hd8c172b9dc1ce682 ()
#17 0x00005555556cb0ea in rayon_core::registry::in_worker::heda56fb9e44646f1 ()
#18 0x00005555556c0340 in rayon::iter::plumbing::bridge_producer_consumer::helper::h85b815c6acc52d9c ()
#19 0x00005555556cabaf in rayon_core::join::join_context::{{closure}}::hd8c172b9dc1ce682 ()
#20 0x00005555556cb0ea in rayon_core::registry::in_worker::heda56fb9e44646f1 ()
#21 0x00005555556c0340 in rayon::iter::plumbing::bridge_producer_consumer::helper::h85b815c6acc52d9c ()
#22 0x00005555556d5797 in <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::h7e18d7c6e4f01ef3 ()
#23 0x0000555555680323 in rayon_core::registry::WorkerThread::wait_until_cold::h7f249316ded4d7d9 ()
#24 0x00005555561456c8 in rayon_core::registry::ThreadBuilder::run::h9dd80136da8f4724 ()
#25 0x000055555614962a in std::sys_common::backtrace::__rust_begin_short_backtrace::haffca54ca4ae1e6b ()
#26 0x0000555556149c15 in core::ops::function::FnOnce::call_once{{vtable.shim}}::h7a25643fd4689018 ()
#27 0x00005555561abb35 in std::sys::unix::thread::Thread::new::thread_start::hfffad9b0b70f2f00 ()
#28 0x00007ffff7a9fdd4 in start_thread () from /nix/store/ayg065nw0xi1zsyi8glfh5pn4sfqd8xg-glibc-2.37-8/lib/libc.so.6
#29 0x00007ffff7b219b0 in clone3 () from /nix/store/ayg065nw0xi1zsyi8glfh5pn4sfqd8xg-glibc-2.37-8/lib/libc.so.6
It is flaky (seems to work about every other time).
You can run it like face-stabilizer extract-features /path/to/dir/with/images/of/faces
I am running it on master
so that I can use your C++14 fix
Hello! Sorry for the late reply, i took some time to try and understand your code and what you wanted to achieve.
Please do correct me if i`m wrong, but i guess you wanted to use rayon to find rectangles in multiple images in an asynchronous way.
If that's correct, i believe there may be some mistake in your code sample.
I did a small reproducible code to test this:
use dlib_face_recognition::{FaceDetector, FaceDetectorTrait, ImageMatrix};
use rayon::prelude::*;
use serde::{Deserialize, Serialize};
use std::{collections::HashMap, fs::read_dir};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SerializableRectangle {
pub left: i64,
pub top: i64,
pub right: i64,
pub bottom: i64,
}
fn find_rectangles(image: &String) -> Vec<SerializableRectangle> {
println!("{image}");
let mut rect_results: Vec<SerializableRectangle> = vec![];
let image = image::open(image).unwrap().to_rgb8();
let matrix = ImageMatrix::from_image(&image);
let detector = FaceDetector::default();
let face_locations = detector.face_locations(&matrix);
for found_rectangle in face_locations.iter() {
let serializable_rectangle = SerializableRectangle {
left: found_rectangle.left,
right: found_rectangle.right,
top: found_rectangle.top,
bottom: found_rectangle.bottom,
};
rect_results.push(serializable_rectangle);
}
rect_results
}
fn main() {
let Ok(paths) = read_dir("./images") else {
panic!("Unable to read images folder!")
};
let mut images: HashMap<String, Vec<SerializableRectangle>> = HashMap::new();
for path in paths.flatten() {
let string = path.path().display().to_string();
images.insert(string, vec![]);
}
images.par_iter_mut().for_each(|entry| {
let rects = find_rectangles(entry.0);
for rect in rects {
entry.1.push(rect);
}
});
for entry in images {
println!("{:?}", entry.1)
}
}
This works without any errors, proving that the structs in this crate are indeed thread-safe (at least, with this code approach). In another possibility, it could be something about trying to load the detector files in such a way, i have this happen to me before, but at this point, i'm unsure about that direction.
I really hope to have been able to help or guess your code objective correctly.
Please do try this out, and keep us informed!
That is basically what I do, with a few minor tweaks:
So your code would look like this (might not compile, I haven't tested it):
use dlib_face_recognition::{FaceDetector, FaceDetectorTrait, ImageMatrix, LandmarkPredictor, LandmarkPredictorTrait};
use rayon::prelude::*;
use serde::{Deserialize, Serialize};
use std::{collections::HashMap, fs::read_dir};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SerializableRectangle {
pub left: i64,
pub top: i64,
pub right: i64,
pub bottom: i64,
}
fn find_rectangles(image: &String, detector: &FaceDetector, predictor: &LandmarkPredictor) -> Vec<SerializableRectangle> {
println!("{image}");
let mut rect_results: Vec<SerializableRectangle> = vec![];
let image = image::open(image).unwrap().to_rgb8();
let matrix = ImageMatrix::from_image(&image);
let face_locations = detector.face_locations(&matrix);
for found_rectangle in face_locations.iter() {
let serializable_rectangle = SerializableRectangle {
left: found_rectangle.left,
right: found_rectangle.right,
top: found_rectangle.top,
bottom: found_rectangle.bottom,
};
rect_results.push(serializable_rectangle);
// Also save the facial landmarks here
}
rect_results
}
fn main() {
let Ok(paths) = read_dir("./images") else {
panic!("Unable to read images folder!")
};
let mut images: HashMap<String, Vec<SerializableRectangle>> = HashMap::new();
for path in paths.flatten() {
let string = path.path().display().to_string();
images.insert(string, vec![]);
}
let detector = FaceDetector::default();
let predictor = LandmarkPredictor::new("/path/to/model");
images.par_iter_mut().for_each(|entry| {
let rects = find_rectangles(entry.0, &detector, &predictor);
for rect in rects {
entry.1.push(rect);
}
});
for entry in images {
println!("{:?}", entry.1)
}
}
I'll try to write a minimal example next weekend when I have time
Hello! After some more investigation, i did reach the same segfault as @jalil-salame, and can confirm that truly the structs should be marked for !Sync and !Send, except the LandmarkPredictor, which is thread-safe as per my previous extensive-testing with a personal project.
The problem is: negative impls are a nightly locked feature of the language, meaning we cannot do this PR and still use this crate in a stable Rust compiler toolchain. This would also force whatever projects are using this crate to also be forced to run Nightly Rust.
The negative impls RFC is here: https://github.com/rust-lang/rust/issues/68318.
The code/PR itself is pretty easy by itself, so not a problem except that. I do find this important, and would vouch for this to be implemented at some point, but the question of the nightly toolchain still remains.
Ultimately, the decision should be by @kerryeon.
The problem is: negative impls are a nightly locked feature of the language, meaning we cannot do this PR and still use this crate in a stable Rust compiler toolchain. This would also force whatever projects are using this crate to also be forced to run Nightly Rust.
Ultimately, the decision should be by @kerryeon.
You can work around that limitation by adding some phantom data to them (specifically !Sync/!Send phantom data, like UnsafeCell)
If I remember correctly
The problem is: negative impls are a nightly locked feature of the language, meaning we cannot do this PR and still use this crate in a stable Rust compiler toolchain. This would also force whatever projects are using this crate to also be forced to run Nightly Rust.
Ultimately, the decision should be by @kerryeon.
You can work around that limitation by adding some phantom data to them (specifically !Sync/!Send phantom data, like UnsafeCell)
If I remember correctly
PhantomData
will work, using PhantomData<UnsafeCell<()>>
won't, as UnsafeCell
will be Send
.
The right choice should be adding a field with PhatomData<*const ()>
as raw pointers are neither Send
nor Sync
.
I don't know how the dlib types work, if they have interior mutability, they should be !Sync
, if they have thread local data, they should be !Send
.
Having them be !Send
and !Sync
would be the safe choice.
The problem is: negative impls are a nightly locked feature of the language, meaning we cannot do this PR and still use this crate in a stable Rust compiler toolchain. This would also force whatever projects are using this crate to also be forced to run Nightly Rust. Ultimately, the decision should be by @kerryeon.
You can work around that limitation by adding some phantom data to them (specifically !Sync/!Send phantom data, like UnsafeCell) If I remember correctly
PhantomData
will work, usingPhantomData<UnsafeCell<()>>
won't, asUnsafeCell
will beSend
.The right choice should be adding a field with
PhatomData<*const ()>
as raw pointers are neitherSend
norSync
.I don't know how the dlib types work, if they have interior mutability, they should be
!Sync
, if they have thread local data, they should be!Send
.Having them be
!Send
and!Sync
would be the safe choice.
Just did some code testing, PhantomData solution seems to work, albeit a little jank, but it avoids the nightly problem. Will take some time later on tonight to write a PR for that + documentation on the negative impl issue.
Thanks for all the info and cooperation!
Hi, I read the issue and confirmed that dlib
is not thread safe as the following comment: https://github.com/davisking/dlib/issues/340#issuecomment-271841880
And it seems that both Send
and Sync
features are not supported.
Trying to speed up predictions by using
rayon
causes a core dump within dlib's allocator, some of the structures should be marked as either!Sync
or!Send
(or both).