opencv / opencv

Open Source Computer Vision Library
https://opencv.org
Apache License 2.0
75.95k stars 55.62k forks source link

connectedcomponents submission #135

Closed nevion closed 11 years ago

nevion commented 11 years ago

Hi,

This is a git pull request for the following issue (that has went on way too long!). I've fixed up the Dustin's complaints... let's get this code out there!

http://code.opencv.org/issues/1236

There are COUNTLESS threads on how to do connectedcomponents almost monthly submissions to the ML. Stackoverflow alone must have 20+ threads on this. Most end up using cvblobslib which is often broken and too slow to boot. There are some other implementations too and it's possible to do it opencv with findContours but it's really a bad situation atm. People actually are rolling their own implementations still (naively).

Anyway this presents an easy to use single function call which gives the labeled image and optionally the statistics one often wants with connected components (bounding box, centroid, area) and does so with better worst case and average case performance than all the other implementations.

I hope I'll have better luck at submission than through the issue tracker.

vpisarev commented 11 years ago

ok, let's finally put it in. However, I would like you to modify interface significantly and partly modify implementation. Since the current trend in OpenCV is to introduce very stable API and do not change it for a long time, I want the API to be future-proof, and not only at the source code level, but also at binary level.

Why we did not integrate the patch earlier in the first place? Because connected components is quite generic thing. They can be extracted from binary (bi-level) images, they can be extracted from grayscale or even multi-channel images if we define the predicate (close(p, q)) for the neighbor pixels. On the output side, representation of connected components can be very different too, they can be represented as contours, as sets of horizontal/vertical line segments etc. The statistics that is computed in your implementation may be sufficient for some users, but insufficient for others, e.g. J. Matas et al in "Real-time scene text localization and recognition" consider different, in my opinion very useful characteristics like the perimeter, number of holes, number of zero-crossings, number of inflections etc.

So, we can not just "occupy" quite generic name "connected components" for something rather specific and not covering this area.

But I admit that something is better than nothing, and designing an ideal connected component extraction function may take a very long time.

So, let's put the current code in, but make the API more generic and more wrapper-friendly:

vvvvvvvvvvvvvvvvvvvvvvvv

CV_EXPORTS_W int connectedComponents(InputArray image, OutputArray labels, int connectivity = 8, int flags=0);

enum { CC_STAT_LEFT=0, CC_STAT_TOP=1, CC_STAT_WIDTH=2, CC_STAT_HEIGHT=3, CC_STAT_CX=4, CC_STAT_CY=5, CC_STAT_AREA=6, CC_STAT_INTEGRAL_X=7, CC_STAT_INTEGRAL_Y=8, ... };

CV_EXPORTS_W int connectedComponentsWithStats(InputArray image, OutputArray labels, OutputArray stats, int connectivity = 8, int flags=0);

^^^^^^^^^^^^^^^^^^^^^^^^^^^^

First of all, the use of uint64 should be eliminated since it's a weird type and its use here is not justified for any reasonable image size (2 billions of pixels is quite a big number). Then, input image should be made the first parameter, by our common convention and should be made InputArray, which was introduced in 2.4.0. "const Mat&" is implicitly converted to InputArray. The output label image should be made OutputArray. "Mat&" is implicitly converted to OutputArray. The output vector of cc statistics should be also wrapped into OutputArray. The structure ConnectedComponentStats should be removed as too specific and not wrapper-friendly. For Python you provided wrappers, but what about Java? Also, for example, quite a few people want Matlab & Octave interface for OpenCV, and each extra structure means that we will have to provide such a conversion function. Instead, I suggest to pack all the statistics into array of integers. The centroid can be represented in fixed-point format.

Mat labels, stats; connectedComponents(my_image, labels, stats); Point2f center_i(stats.at(i,CC_STAT_CX)/256.f, stats.at(i,CC_STAT_CY)/256.f);

I would also add the label of the component in the "default" statistics, because if we remove (filter off) some of the connected components, e.g. too small ones, the ordering will be lost.

flags parameter will regulate what exactly needs to be computed. 0 means your statistics, e.g. CC_STAT_MATAS_MODE could mean adding some more stuff etc.


Now, on the implementation. It's a lot of code in OpenCV already, and we try to do our best to keep functionality/footprint ratio reasonable (when possible, we try to increase it).

The proposed patch implements connected components extraction for binary image, which is 8-bit single-channel image (CV_8UC1) in 99% cases. If not, it can be converted to this format with a single line of code:

connectedComponents((my_32f_image > eps), labels, stats);

So, it's enough to support CV_8UC1 input. Similarly, it's enough to constrain the output image of labels to 32-bit format (CV_32S). We support only this format in our labeled distance transform and watershed transforms functions and did not hear a single complain. So, let's bring the code size down and make the function non-template 8u->32s code. I'm sure we will find a better use of the space (especially on mobile devices).


One final point. For the code that can be tested (i.e. except for camera capturing, GUI stuff) we now demand some tests, at least smoke tests. The sample you prepared is great, but we can not run it nightly in batch mode. So we will need some tests for these connected component extraction functions. You can look at https://github.com/Itseez/opencv/blob/master/modules/imgproc/test/test_watershed.cpp for a relevant test. It just reads some image, runs the function and compares the output with the previously stored one.

What do you think? If you are fine with the proposed changes and willing to do that and update the pull request, I will merge it in.

nevion commented 11 years ago

This is indeed alot of changes... see my responses inline.

On Thu, Nov 22, 2012 at 10:52 AM, Vadim Pisarevsky <notifications@github.com

wrote:

ok, let's finally put it in. However, I would like you to make a few significantly modify interface and partly modify implementation. Since the current trend in OpenCV is to introduce very stable API and do not change it for a long time, I want the API to be future-proof, and not only at the source code level, but also at binary level.

Why we did not integrate the patch earlier in the first place? Because connected components is quite generic thing. They can be extracted from binary (bi-level) images, they can be extracted from grayscale or even multi-channel images if we define the predicate (close(p, q)) for the neighbor pixels. On the output side, representation of connected components can be very different too, they can be represented as contours, as sets of horizontal/vertical line segments etc. The statistics that is computed in your implementation may be sufficient for some users, but insufficient for others, e.g. J. Matas et al in "Real-time scene text localization and recognition" consider different, in my opinion very useful characteristics like the perimeter, number of holes, number of zero-crossings, number of inflections etc.

Managing a fast implementation of all of the statistics is a hard thing to do. With templates, as I do in the code, you can redifine the functor used for statistics computation and pay for what you use. But this is OpenCV and it is not common to make so much of the implementation available in headers. Instead I just added the by-far most common statistics... sure there will be people who want more.... unfortunately given the difficulty/slownness of providing that, they will just have to do a 3rd pass over the image, which is all the statistics computation saves you from doing. Or make the template implementation available.

So, we can not just "occupy" quite generic name "connected components" for something rather specific and not covering this area.

But I admit that something is better than nothing, and designing an ideal connected component extraction function may take a very long time.

I consider the dense matrix representation (ie an image) to be the fastest and most useful representation of the result. I also believe that the required input is binary in nature (but type independent) and other than optimization does not make sense to define for multi-channel images. However I think it is possible to add multi-channel support at the expense of complicating the implementation further.

So, let's put the current code in, but make the API more generic and more wrapper-friendly:

vvvvvvvvvvvvvvvvvvvvvvvv

CV_EXPORTS_W int connectedComponents(InputArray image, OutputArray labels, int connectivity = 8, int flags=0);

enum { CC_STAT_LEFT=0, CC_STAT_TOP=1, CC_STAT_WIDTH=2, CC_STAT_HEIGHT=3, CC_STAT_CX=4, CC_STAT_CY=5, CC_STAT_AREA=6, CC_STAT_INTEGRAL_X=7, CC_STAT_INTEGRAL_Y=8, ... };

CV_EXPORTS_W int connectedComponentsWithStats(InputArray image, OutputArray labels, OutputArray stats, int connectivity = 8, int flags=0);

^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I wasn't aware of this, it's fine, though at this point, annoying change. Kudos for having a good/easy way to support all those other languages/environments.

First of all, the use of uint64 should be eliminated since it's a weird

type and its use here is not justified for any reasonable image size (2 billions of pixels is quite a big number)

You may scoff but I've gotten uncomfortably close to the limits of uint32 for labels. I've also gotten surprises from Xlib only supporting images of 2^15 or 16th in dimension before. Leave it and be future proof for all those nasa or panoramic types, hm? 64GB of ram is only a few hundred dollars these days.

Then, input image should be made the first parameter, by our common convention and should be made InputArray, which was introduced in 2.4.0. "const Mat&" is implicitly converted to InputArray. The output label image should be made OutputArray. "Mat&" is implicitly converted to OutputArray. The output vector of cc statistics should be also wrapped into OutputArray. The structure ConnectedComponentStats should be removed as too specific and not wrapper-friendly. For Python you provided wrappers, but what about Java? Also, for example, quite a few people want Matlab & Octave interface for OpenCV, and each extra structure means that we will have to provide such a conversion function. Instead, I suggest to pack all the statistics into array of integers. The centroid can be represent ed in fixed-point format.

The centroid computations are cheap since the number of divisions is the number of components. If you're going to convert them to floats at some point might as well just do it once?

Mat labels, stats;

connectedComponents(my_image, labels, stats); Point2f center_i(stats.at(i,CC_STAT_CX)/256.f, stats.at (i,CC_STAT_CY)/256.f);

I would also add the label of the component in the "default" statistics, because if we remove (filter off) some of the connected components, e.g. too small ones, the ordering will be lost.

Not needed. Their position in the array tells you the label. If you reduce the array by filtering, you should keep track of it yourself with a tuple/pair object or extra column/vector. This is something I encounter all over the place in unrelated context and code and that's the way I've dealt with it. I don't believe that approach will slow one down one by any meaningful amount of time.

flags parameter will regulate what exactly needs to be computed. 0 means

your statistics, e.g. CC_STAT_MATAS_MODE could mean adding some more stuff etc.

Predefined groups of statistics are the only way to really manage the complexity of implementation though it still leaves custom statistics out in the cold for that 3rd pass. Having thought about matas's statistics I'm not sure if they could or should be implemented in the same way though... What I currently compute is done in the relabeling step (2nd pass) and the final result is not actually finished yet (the pixel after the current one used in statistics computation is not yet labeled with the final label). So some of these statistics may best run in a 3rd pass if it depends on a neighborhood or is otherwise not causal. Either way this is looking to be a relatively large add-on effort. These things are probably best put into an auxiliary function using a 3rd pass where as the statistics I have computed can be computed in the 2nd pass efficiently.


Now, on the implementation. It's a lot of code in OpenCV already, and we try to do our best to keep functionality/footprint ratio reasonable (when possible, we try to increase it).

The proposed patch implements connected components extraction for binary image, which is 8-bit single-channel image (CV_8UC1) in 99% cases. If not, it can be converted to this format with a single line of code:

connectedComponents((my_32f_image > eps), labels, stats);

So, it's enough to support CV_8UC1 input. Similarly, it's enough to constrain the output image of labels to 32-bit format (CV_32S). We support only this format in our labeled distance transform and watershed transforms functions and did not hear a single complain. So, let's bring the code size down and make the function non-template 8u->32s code. I'm sure we will find a better use of the space (especially on mobile devices).

Input is bi-level, that is established. However output depends on the size of the images or characteristics of the input and proper output choice matters for speed. For some images uint8 is enough, others uint16, and others more uint32... and finally, only because it is the next largest size, uint64 for those working on extremely large images (signed types make no sense here).

Further in-place threshholding a float without new image being constructed and set is still faster... that's what this algorithm/implementation is all about, getting it as fast as possible and giving the user enough power and flexibility to shoot themselves in the foot (output-types) but also shave time off and get real-time interactive output and reduce memory requirements / cache thrashing.

So I think it is unfortunate and true that there is some unneeded code wrt an individuals program, but it's not worth the sacrifice. Too bad strip --strip-unneeded doesn't work here, right?


One final point. For the code that can be tested (i.e. except for camera capturing, GUI stuff) we now demand some tests, at least smoke tests. The sample you prepared is great, but we can not run it nightly in batch mode. So we will need some tests for this connected component extraction functions. You can look at https://github.com/Itseez/opencv/blob/master/modules/imgproc/test/test_watershed.cppfor a similar function. It just reads some image, runs the function and compares the output with the previously stored one.

I'll do that.

What do you think? If you are fine with the proposed changes and willing to do that and update the pull request, I will merge it in.

I'm extremely happy someone finally talked with me and your email was thorough. I wish it came earlier and maybe in smaller chucks.... I don't have much time for FOSS contributions these days. .. I've been doing extensive in the field (air) testing of vision systems every month or so for the last 5 months so it's been a crunch time and then some leaving me with little time for anything else.

I should be able to get most of these done hopefully before the weekend is over but no promises... I was already working on occupational projects. The things we have dis-agreed on will need re-evaluation though.

Thank you and best regards, Jason

Reply to this email directly or view it on GitHubhttps://github.com/Itseez/opencv/pull/135#issuecomment-10642632.

nevion commented 11 years ago

Actually, I'll concede on the return type of uint64. I don't like using a signed type for the return type given a large enough image though. One can recover the actual number by casting again back to unsigned but this should be an unsigned type as it is the count of something. Any final thoughts on this? I don't suppose opencv will ever add in a uint32...

After seeing the 360k gcc outputs for this binary after stripping I took the size reduction a little more seriously. I cut the input variations down to just uint8 which reduced it to 90k, a bit more on par with the other functions in imgproc. I don't like doing this that much but at least it's only a few lines to add in support for these types... at the expense of a private build.

On Thu, Nov 22, 2012 at 6:40 PM, Jason Newton nevion@gmail.com wrote:

This is indeed alot of changes... see my responses inline.

On Thu, Nov 22, 2012 at 10:52 AM, Vadim Pisarevsky < notifications@github.com> wrote:

ok, let's finally put it in. However, I would like you to make a few significantly modify interface and partly modify implementation. Since the current trend in OpenCV is to introduce very stable API and do not change it for a long time, I want the API to be future-proof, and not only at the source code level, but also at binary level.

Why we did not integrate the patch earlier in the first place? Because connected components is quite generic thing. They can be extracted from binary (bi-level) images, they can be extracted from grayscale or even multi-channel images if we define the predicate (close(p, q)) for the neighbor pixels. On the output side, representation of connected components can be very different too, they can be represented as contours, as sets of horizontal/vertical line segments etc. The statistics that is computed in your implementation may be sufficient for some users, but insufficient for others, e.g. J. Matas et al in "Real-time scene text localization and recognition" consider different, in my opinion very useful characteristics like the perimeter, number of holes, number of zero-crossings, number of inflections etc.

Managing a fast implementation of all of the statistics is a hard thing to do. With templates, as I do in the code, you can redifine the functor used for statistics computation and pay for what you use. But this is OpenCV and it is not common to make so much of the implementation available in headers. Instead I just added the by-far most common statistics... sure there will be people who want more.... unfortunately given the difficulty/slownness of providing that, they will just have to do a 3rd pass over the image, which is all the statistics computation saves you from doing. Or make the template implementation available.

So, we can not just "occupy" quite generic name "connected components" for something rather specific and not covering this area.

But I admit that something is better than nothing, and designing an ideal connected component extraction function may take a very long time.

I consider the dense matrix representation (ie an image) to be the fastest and most useful representation of the result. I also believe that the required input is binary in nature (but type independent) and other than optimization does not make sense to define for multi-channel images. However I think it is possible to add multi-channel support at the expense of complicating the implementation further.

So, let's put the current code in, but make the API more generic and more wrapper-friendly:

vvvvvvvvvvvvvvvvvvvvvvvv

CV_EXPORTS_W int connectedComponents(InputArray image, OutputArray labels, int connectivity = 8, int flags=0);

enum { CC_STAT_LEFT=0, CC_STAT_TOP=1, CC_STAT_WIDTH=2, CC_STAT_HEIGHT=3, CC_STAT_CX=4, CC_STAT_CY=5, CC_STAT_AREA=6, CC_STAT_INTEGRAL_X=7, CC_STAT_INTEGRAL_Y=8, ... };

CV_EXPORTS_W int connectedComponentsWithStats(InputArray image, OutputArray labels, OutputArray stats, int connectivity = 8, int flags=0);

^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I wasn't aware of this, it's fine, though at this point, annoying change. Kudos for having a good/easy way to support all those other languages/environments.

First of all, the use of uint64 should be eliminated since it's a weird

type and its use here is not justified for any reasonable image size (2 billions of pixels is quite a big number)

You may scoff but I've gotten uncomfortably close to the limits of uint32 for labels. I've also gotten surprises from Xlib only supporting images of 2^15 or 16th in dimension before. Leave it and be future proof for all those nasa or panoramic types, hm? 64GB of ram is only a few hundred dollars these days.

Then, input image should be made the first parameter, by our common convention and should be made InputArray, which was introduced in 2.4.0. "const Mat&" is implicitly converted to InputArray. The output label image should be made OutputArray. "Mat&" is implicitly converted to OutputArray. The output vector of cc statistics should be also wrapped into OutputArray. The structure ConnectedComponentStats should be removed as too specific and not wrapper-friendly. For Python you provided wrappers, but what about Java? Also, for example, quite a few people want Matlab & Octave interface for OpenCV, and each extra structure means that we will have to provide such a conversion function. Instead, I suggest to pack all the statistics into array of integers. The centroid can be represent ed in fixed-point format.

The centroid computations are cheap since the number of divisions is the number of components. If you're going to convert them to floats at some point might as well just do it once?

Mat labels, stats;

connectedComponents(my_image, labels, stats); Point2f center_i(stats.at(i,CC_STAT_CX)/256.f, stats.at (i,CC_STAT_CY)/256.f);

I would also add the label of the component in the "default" statistics, because if we remove (filter off) some of the connected components, e.g. too small ones, the ordering will be lost.

Not needed. Their position in the array tells you the label. If you reduce the array by filtering, you should keep track of it yourself with a tuple/pair object or extra column/vector. This is something I encounter all over the place in unrelated context and code and that's the way I've dealt with it. I don't believe that approach will slow one down one by any meaningful amount of time.

flags parameter will regulate what exactly needs to be computed. 0 means

your statistics, e.g. CC_STAT_MATAS_MODE could mean adding some more stuff etc.

Predefined groups of statistics are the only way to really manage the complexity of implementation though it still leaves custom statistics out in the cold for that 3rd pass. Having thought about matas's statistics I'm not sure if they could or should be implemented in the same way though... What I currently compute is done in the relabeling step (2nd pass) and the final result is not actually finished yet (the pixel after the current one used in statistics computation is not yet labeled with the final label). So some of these statistics may best run in a 3rd pass if it depends on a neighborhood or is otherwise not causal. Either way this is looking to be a relatively large add-on effort. These things are probably best put into an auxiliary function using a 3rd pass where as the statistics I have computed can be computed in the 2nd pass efficiently.


Now, on the implementation. It's a lot of code in OpenCV already, and we try to do our best to keep functionality/footprint ratio reasonable (when possible, we try to increase it).

The proposed patch implements connected components extraction for binary image, which is 8-bit single-channel image (CV_8UC1) in 99% cases. If not, it can be converted to this format with a single line of code:

connectedComponents((my_32f_image > eps), labels, stats);

So, it's enough to support CV_8UC1 input. Similarly, it's enough to constrain the output image of labels to 32-bit format (CV_32S). We support only this format in our labeled distance transform and watershed transforms functions and did not hear a single complain. So, let's bring the code size down and make the function non-template 8u->32s code. I'm sure we will find a better use of the space (especially on mobile devices).

Input is bi-level, that is established. However output depends on the size of the images or characteristics of the input and proper output choice matters for speed. For some images uint8 is enough, others uint16, and others more uint32... and finally, only because it is the next largest size, uint64 for those working on extremely large images (signed types make no sense here).

Further in-place threshholding a float without new image being constructed and set is still faster... that's what this algorithm/implementation is all about, getting it as fast as possible and giving the user enough power and flexibility to shoot themselves in the foot (output-types) but also shave time off and get real-time interactive output and reduce memory requirements / cache thrashing.

So I think it is unfortunate and true that there is some unneeded code wrt an individuals program, but it's not worth the sacrifice. Too bad strip --strip-unneeded doesn't work here, right?


One final point. For the code that can be tested (i.e. except for camera capturing, GUI stuff) we now demand some tests, at least smoke tests. The sample you prepared is great, but we can not run it nightly in batch mode. So we will need some tests for this connected component extraction functions. You can look at https://github.com/Itseez/opencv/blob/master/modules/imgproc/test/test_watershed.cppfor a similar function. It just reads some image, runs the function and compares the output with the previously stored one.

I'll do that.

What do you think? If you are fine with the proposed changes and willing to do that and update the pull request, I will merge it in.

I'm extremely happy someone finally talked with me and your email was thorough. I wish it came earlier and maybe in smaller chucks.... I don't have much time for FOSS contributions these days. .. I've been doing extensive in the field (air) testing of vision systems every month or so for the last 5 months so it's been a crunch time and then some leaving me with little time for anything else.

I should be able to get most of these done hopefully before the weekend is over but no promises... I was already working on occupational projects. The things we have dis-agreed on will need re-evaluation though.

Thank you and best regards, Jason

Reply to this email directly or view it on GitHubhttps://github.com/Itseez/opencv/pull/135#issuecomment-10642632.

vpisarev commented 11 years ago

Jason, no need to implement Matas etc. props, we just need to leave space for it in the API. Actually, I now think of going even further to make it an "Algorithm":

class CComp : public Algorithm { public: ... int label(InputArray img, OutputArray labels) const; int labelAndGetStat(InputArray img, OutputArray labels, OutputArray stats) const; };

and everything else can be made properties, so it's super-extendable:

Ptr ccomp = Algorithm::create("connected-component-extractor" /* or use some better name? */ ); // change parameters if the default values are not good enough ccomp->setInt("connectivity", 4); ccomp->setString("mode", "Matas"); ccomp->setInt("outputFormat", CV_64F); // run the algorithm ccomp->label(img, labels, stats);

I understand that: 1) you may not have enough time to modify the code and those InputArray/OutputArray/Algorithm concepts is something extra that you do not spend time on. 2) whatever changes we propose, the end result should work for you, otherwise it would be very silly situation - contribute something, spend time on it and get unusable result.

so I suggest to do it together. If you make a branch in your repository and give me access to it, I can help to adjust the code. Or I can submit a pull request to your repository at github. Is it fine?

On the various data type support. uint32 is actually needed very rarely, as we learnt from our experience and users' feedback. If int32 is not enough (we prefer to call it int, since on any known for us 32-bit or 64-bit OS sizeof(int)==4), we found double to be the best alternative. It's supported in hardware anywhere nowadays, from most low-power ARMs to the modern Intel chips with AVX instructions (that can process 4 double's at once). double can represent any integer <=2*53 exactly, it takes the same space as [u]int64, it's usually faster to process than 64-bit integers on 32-bit CPUs and about as fast on 64-bit CPUs (actually, it's faster there too if we take into account SSE2/AVX that have a very limited support for 64-bit integers). So, uint32, int64/uint64 are useless types in my opinion. (BTW, if "int" as return type is not enough for connectedComponent functions, I would suggest to use "double"). And if we have some spare resources to add another datatype to OpenCV's Mat, I would immediately choose float16 (half).

vpisarev commented 11 years ago

oh, I forgot to add that another reason to make it an Algorithm is to add [later, if not now] some other useful properties, e.g. ccomp->setInt("minArea", 10); if a connected component is too small, the algorithm could wipe it out (or just do to store the statistics for it, but for that we need the "label" component in each "stat" row). Filtering out tiny connected components is very useful feature, sometimes it's the only purpose of using connected component function (like in our stereo correspondence algorithms)

nevion commented 11 years ago

Hi Vadim,

I've made alot of the changes we've discussed... I'll go through the checklist again later but the last thing before the introduction of the algorithm functor is the autobuild friendly test.

On the stats thing, I made it an OutputArray of Ints (using the types as unsigned int internally). This doesn't really fit in with the centroid though even though it is easily computable from these ints. I don't think fixed point makes it any easier to deal with and that it throws away precision is just more mud in the face.

So 1) the fixed point way is: Point2f center_i(stats.at http://stats.at(l,CC_STAT_CX)/256.f, stats.at http://stats.at(l,CC_STAT_CY)/256.f); 2) The compute as needed way: Point2f center_i(stats.at http://stats.at(l,CC_STAT_INTEGRAL_X)/ float(stats.at http://stats.at(l,CC_STAT_AREA)), stats.athttp://stats.at (l,CC_STAT_INTEGRAL_Y)//float(stats.at http://stats.at (l,CC_STAT_AREA)); OR Point2f center(Vec2f(stats.at http://stats.at(l,CC_STAT_INTEGRAL_X)), stats.at http://stats.at(l,CC_STAT_INTEGRAL_Y))/stats.athttp://stats.at (l,CC_STAT_AREA));

3) The doubles (slower) approach: Point2f center_i(stats.at http://stats.at(l,CC_STAT_CX), stats.at http://stats.at(l,CC_STAT_CY));

I think the compute as needed way (2) is the best approach but would be often repeated and is a fairly long line. It also assumes users will recognize the integrals and areas to get centroids. Any suggestions on dealing with this so it is more friendly to users? Perhaps a mention/example in documentation is the only way to go?

I'm also a little worried about the integrals as well. Imagine a trivial all-1 bi-level of 8192x8192, this would have an integral on x and y equating to (2^13)_(2^13-1)/2_2^13 or 2^38. Well exceeding the int's capabilities. The only solutions here are: 1a) Don't export the integral and use a private workspace for it (with uint64 or double), export results as fixed point integers 1b) Same as 1 but export it as an additional OutputArray, as double for centroids 2) Use a struct as originally done 3) use all doubles

Please let me know your thoughts on the best solution. I'm thinking if we want to guard against these perfectly possible issues, either internal integrals of uint64 or structs is the way to go. The former still would require either fixed point or an additional matrix for centroids.

I suppose I wouldn't mind seeing the following signature, as a user:

CV_EXPORTS_W int connectedComponentsWithStats(InputArray image, OutputArray labels, OutputArray stats, OutputArray centroids, int connectivity = 8);


On the topic of the functor... I'm a little worried about bloat. I'm not saying such an object wouldn't be useful to quickly chain a vision stack together but it seems like it could get to be pretty large rather than the simple stand-alone utility I originally envisioned.

Perhaps both should be provided... and maybe not just for this function, but for all of these constructs of image processing. I would say they are beyond my scope of use though, personally. Its a little reminiscent of ITK (although they force their iterator concepts upon everyone and are a heavily templated library).

On Fri, Nov 23, 2012 at 12:44 AM, Vadim Pisarevsky <notifications@github.com

wrote:

oh, I forgot to add that another reason to make it an Algorithm is to add [later, if not now] some other useful properties, e.g. ccomp->setInt("minArea", 10); if a connected component is too small, the algorithm could wipe it out (or just do to store the statistics for it, but for that we need the "label" component in each "stat" row). Filtering out tiny connected components is very useful feature, sometimes it's the only purpose of using connected component function (like in our stereo correspondence algorithms)

— Reply to this email directly or view it on GitHubhttps://github.com/Itseez/opencv/pull/135#issuecomment-10653087.

vpisarev commented 11 years ago

Hi Jason! good progress, it's converging. Some comments:

1) First of all, the output arrays are not created in your code, instead, you use the existing type to dispatch to the right function. This does not conform to the current OpenCV guidelines. The output arrays can empty or have wrong size or type, and the function should work correctly nevertheless. In fact, it should ignore the current size and type of the output image. If the output image type can be different, it should be explicitly passed to the function. Also, do we really need 8u output type for labels? It's 1/3 of the code size and seem quite impractical. Still, if you want to support 8u, 16u and 32u/s, you need to add another output parameter:

int connectedComponents(InputArray _img, OutputArray _labels, int connectivity=8, int ltype=CV_32S) { Mat img = _img.getMat(); _labels.create(img.size(), CV_MAT_TYPE(ltype)); Mat labels = _labels.getMat(); // ... call internal functions. }

2) the related note - the internal functions should probably take [const] Mat& instead of InputArrays, it's more efficient and safe.

3) even though you already refactored your code, I would suggest to declare the ConnectedCompStat structure (or how you call it) to the .cpp and make the template functions return vector. The transformation vector<C...Stat> => OutputArray stats can be moved to a separate function, which can be non-template (it does not depend on the type of labels image, right?). Also, this transformation will help to keep the code more elegant and easier to handle various issues with accuracy etc. - they are postponed till the very end.

4) on CC_STAT_CX, CC_STAT_CY - I did not quite understand what you mean, the formatting in your comment is broken.

5) on CC_STATINTEGRAL* - I did not quite understand it too, but for a different reason. I thought, the characteristics depends on the area and shape of the connected component, not on its position, and therefore 32-bit value is enough. Can you provide some information on what is CC_STAT_INTEGRAL_X/Y exactly?

nevion commented 11 years ago

Vadim,

Inlined comments.

On Wed, Nov 28, 2012 at 11:36 AM, Vadim Pisarevsky <notifications@github.com

wrote:

Hi Jason! good progress, it's converging. Some comments:

1) First of all, the output arrays are not created in your code, instead, you use the existing type to dispatch to the right function. This does not conform to the current OpenCV guidelines. The output arrays can empty or have wrong size or type, and the function should work correctly nevertheless.

Fair enough. The reason it is like that is to allow reuse without realloc of the output array. Does the allocator for output arrays optimize the case of an existing sufficiently sized buffer? I also liked that it internally passed the label type in. But guidelines are guidelines so this will change to then.

In fact, it should ignore the current size and type of the output image. If the output image type can be different, it should be explicitly passed to the function. Also, do we really need 8u output type for labels? It's 1/3 of the code size and seem quite impractical. Still, if you want to support 8u, 16u and 32u/s, you need to add another output parameter:

Yea that 8u is a funny one... it's too low for my use but I've seen/heard people use it on QR/barcode readers and in situations where it is not safe (label overflow) but knowingly do so because it is low probability and they want the speed. I won't fight for it so I'll leave the decision up to you.

int connectedComponents(InputArray _img, OutputArray _labels, int connectivity=8, int ltype=CV_32S) { Mat img = _img.getMat(); _labels.create(img.size(), CV_MAT_TYPE(ltype)); Mat labels = _labels.getMat(); // ... call internal functions. }

2) the related note - the internal functions should probably take [const] Mat& instead of InputArrays, it's more efficient and safe.

Ok.

3) even though you already refactored your code, I would suggest to declare the ConnectedCompStat structure (or how you call it) to the .cpp and make the template functions return vector. The transformation vector => OutputArray stats can be moved to a separate function, which can be non-template (it does not depend on the type of labels image, right?). Also, this transformation will help to keep the code more elegant and easier to handle various issues with accuracy etc. - they are postponed till the very end.

Ok. This also means I won't output the integral stats, they are normally useless anyway and have range issues with everything less than double/int64. I may also only do this with the integrals if the stats output array stays int.

4) on CC_STAT_CX, CC_STAT_CY - I did not quite understand what you mean, the formatting in your comment is broken.

Hm, maybe because I compose on gmail? Boils down to centroids not being OutputArray friendly with the other parameters unless everything is doubles or centroids are a separate OutputArray. I think a separate array is fastest computation time (next to fixed point) and easiest to use (no need to stick a strange divisor everywhere when using). Otherwise fixed point and one output array with the divisor though I can see the newbies and academics' frustrated faces already.

5) on CC_STATINTEGRAL* - I did not quite understand it too, but for a different reason. I thought, the characteristics depends on the area and shape of the connected component, not on its position, and therefore 32-bit value is enough. Can you provide some information on what is CC_STAT_INTEGRAL_X/Y exactly?

The integral is computed for centroid computation. It is the sum of positions a component is present in and it grows cubically for a square image where n is the square's dimension. It overflows easily as a result. Divide this 2-vector by the area and you get centroid (average position). If the image is effectively one giant connected component (or close to it) you can see that this would give you an unexpected result easily.

Reply to this email directly or view it on GitHubhttps://github.com/Itseez/opencv/pull/135#issuecomment-10817621.

vpisarev commented 11 years ago

ok, so if the ConnectedComp structure is used for intermediate representation (where integrals can be encoded as double's or 64-bit integers) then the output array of statistics can be without them, because they do not much sense for the users.

On 8-bit output label array - let's throw it away and keep just CV_16U and CV_32S options.

CX and CY representation - well, need to think. Personally I do not think that they should be stored with very high accuracy - any noise pixel added to or removed from the component may affect them. I would say, 1/256 should be enough, but I admit that I do not know all the possible use cases. Splitting the output array into 2 - I would say, this is the least convenient option.

Hey, what about yet another option - encode each of CX and CY as a pair of integers? stat[i][CC_STAT_CX] would store the rounded-to-nearest x-coordinate of the centroid and stat[i][CC_STAT_FX] would store the signed fractional part, multiplied by (2**31)? We can add a constant:

define CV_CC_STAT_CENTROID_SCALE 4.656612873077393e-10 /* 1/(2*31) /

so that the actual centroid can be computed as stat[i][CC_STAT_CX]+stat[i][CC_STAT_FX]*CV_CC_STAT_CENTROID_SCALE.

It will give significantly better accuracy than single-precision floating-point number. We can add inline function

inline Point2f getCCompCentroid(const int* stat) { return Point2f(stat[CC_STAT_CX]+..., stat[CC_STAT_CY]+...); }

which can be used as

vector<Vec<int, 9> > ccomp; Mat img, labels; getConnectedComponentsAndStats(img, labels, ccomp, 8); for( size_t i = 0; i < ccomp.size(); i++ ) { Point2f c = getCCompCentroid(&ccomp[i][0]); }

those who only need very rough position of the centroid can just use Point(stat[CC_STAT_CX], stat[CC_STAT_CY]).

What do you think?

nevion commented 11 years ago

On Sat, Dec 1, 2012 at 9:16 AM, Vadim Pisarevsky notifications@github.comwrote:

ok, so if the ConnectedComp structure is used for intermediate representation (where integrals can be encoded as double's or 64-bit integers) then the output array of statistics can be without them, because they do not much sense for the users.

On 8-bit output label array - let's throw it away and keep just CV_16U and CV_32S options.

Ok.

CX and CY representation - well, need to think. Personally I do not think that they should be stored with very high accuracy - any noise pixel added to or removed from the component may affect them. I would say, 1/256 should be enough, but I admit that I do not know all the possible use cases. Splitting the output array into 2 - I would say, this is the least convenient option.

Hey, what about yet another option - encode each of CX and CY as a pair of integers? stat[i][CC_STAT_CX] would store the rounded-to-nearest x-coordinate of the centroid and stat[i][CC_STAT_FX] would store the signed fractional part, multiplied by (2**31)? We can add a constant:

define CV_CC_STAT_CENTROID_SCALE 4.656612873077393e-10 /* 1/(2*31) /

so that the actual centroid can be computed as stat[i][CC_STAT_CX]+stat[i][CC_STAT_FX]*CV_CC_STAT_CENTROID_SCALE.

It will give significantly better accuracy than single-precision floating-point number. We can add inline function

inline Point2f getCCompCentroid(const int* stat) { return Point2f(stat[CC_STAT_CX]+..., stat[CC_STAT_CY]+...); }

which can be used as

vector > ccomp; Mat img, labels; getConnectedComponentsAndStats(img, labels, ccomp, 8); for( size_t i = 0; i < ccomp.size(); i++ ) { Point2f c = getCCompCentroid(&ccomp[i][0]); }

those who only need very rough position of the centroid can just use Point(stat[CC_STAT_CX], stat[CC_STAT_CY]).

What do you think?

I think fixed point is more ugly than a separate array and floating point (double precision). The macro wouldn't be usable in other languages too, unless they're exported. The function would be. One more alternative is to split the double into lower and higher halfs and stick the bits in integer memory. I've heard that some architectures don't like floats using arbitrary memory or something to that affect. In other languages again though this could be a large annoyance at interpreting the memory correctly however your function solution still works and hides whatever complexity internally.

Having said that, I think the stand alone function is just as ugly as another output array though.. if not uglier. Please mull it over a little more.

-Jason

— Reply to this email directly or view it on GitHubhttps://github.com/Itseez/opencv/pull/135#issuecomment-10919393.

vpisarev commented 11 years ago

ok, let's output the connected component centroid in a separate output array of 2-channel floating-point type (CV_32FC2 or CV_64FC2 - at your choice).

nevion commented 11 years ago

Vadim,

I believe I've got everything you mentioned taken care of, including the test case which I have also forked from your branch and updated with my input/expected output for regression testing.

Let me know if there's anything missing still.

-Jason

vpisarev commented 11 years ago

looks good now. One thing to fix, though, is compile errors on Windows: http://pullrequest.opencv.org (I have to apologize for scrambled error messages; we are aware of this problem and working on it) and then we can probably integrate it and then do incremental improvements

alekcac commented 11 years ago

Vadim, any changes?

nevion commented 11 years ago

I'll have the change(s) incorporate soon, I'm very busy with work at the moment... even for this small change.

On Wed, Dec 12, 2012 at 1:52 AM, Alexander Shishkov < notifications@github.com> wrote:

Vadim, any changes?

— Reply to this email directly or view it on GitHubhttps://github.com/Itseez/opencv/pull/135#issuecomment-11282732.

nevion commented 11 years ago

Hm, regarding these types I've added them to types_c.h... common programming lore states that the type of char is not guaranteed to be 8 bits though... and neither is int to be 32 (I've used one of these - a dsp platform with 16 bit ints and still using gcc 2.95 or some such). I'm not sure what to propose for older code but C++ 0x11/c99 guarantees stdint which provides integer types with guaranteed bit-depths.

With that said I made the change reusing existing typedefs so int8 - uint32 are now also defined rather than just int64 and updated connectedcomponents afterwords.

vpisarev commented 11 years ago

Hi Jason, such a change in type_c.h can potentially cause conflicts. I'm afraid, we can not accept such a change, sorry. If you need such definitions in .cpp, please, put the declarations there.

Otherwise, the patch looks fine. With one little note - please, take a look at http://pullrequest.opencv.org, your code produces some warnings for android; the code can only be merged if it builds without warnings. You can fix it or I can submit pull request to your repository to fix that, as you wish.

nevion commented 11 years ago

Vadim - yea I thought it was pretty controversial but somebody (joshdoe?) suggested to add such typedefs there. If I may ask where is the conflict you are concerned about and is this a reasonable judgment? I have received no compilation errors in regards to it and it just replicates the naming convention of u/int64 which is a sane and useful thing IMO. Or is this why they were originally missing? I can move them back but I do consider this a small step backwards..

I have been monitoring the builds since you originally provided me the URL, I figured there was such no merge unless ... conditions were in play. RE the android failing, wasn't able to decern why and I don't have an android environment setup... if it's not to much trouble could you submit a patch to fix the error there?

joshdoe commented 11 years ago

It was my suggestion. I was a bit surprised to see OpenCV didn't have fixed width standard typedefs. However Vadim does have somewhat of a point, since your simple typedefs aren't robust enough to work across all platforms. Leaving each module or even each cpp file to implement these on their own is asking for trouble. Perhaps a better way is to use stdint.h, though it would mean including some compiler specific files (e.g. msinttypes).

nevion commented 11 years ago

joshdoe - sorry I didn't realize github attached the comment to the diff and hid it all together.

I think this is the best approach if OpenCV is going to stay stuck on C++ 98 - use the compiler specific options or just manually coded statements... .I think the later is a little easier, see lines 74-90 here http://code.google.com/p/msinttypes/source/browse/trunk/stdint.h for all the windows support you will need. Looks like that covers borland too.

For glibc using platforms, stdint.h has been there since circa 2005.

Small amendment: boost has a single file solution for this: http://www.boost.org/doc/libs/1_52_0/boost/cstdint.hpp

Second amendment: it is almost single file... it relies on boost/config.hpp and in tern the 23 files under boost/config/compiler to deal with int64.

vpisarev commented 11 years ago

Hi Jason,

if you want to contribute some little (yet useful) piece code to opencv, please, try to make as little modifications in the external headers as possible, and follow our conventions. I'm 100% sure, every big project has a similar policy.

In OpenCV the conventions are the following:

uchar == unsigned char == uint8_t schar == signed char == int8_t ushort = unsigned short == uint16_t short == short == int16_t int == int32_t unsigned == uint32_t int64 == long long == int64_t uint64 == unsigned long long == uint64_t float == float32_t double == float64_t

That's it. No argues about that. Please, do not mention C++ standards, this is constant pain for us. It's just our rules that work fine for us and our users during 12 years of OpenCV existence.

If you want to have more freedom, that's perfectly fine, and starting from 2.5 we will have alternative option for such contributions - make a 3rd-party module. Here is the short guide, a copy of my e-mail to another contributor. You should replace "color_tracker" below with "connected_component" or something like that. As soon as we finalize some more details, we will properly format the guide and publish it somewhere.

[start of e-mail]

In order to make OpenCV more solid and at the same time to make OpenCV community more open, where code can be shared not only via the very small core team, we decided to change our policy in respect with the contributed code. While bug fixes and very minor additions can be included right into OpenCV, new algorithms should be shaped as modules. For example, if you download OpenCV 2.4.3 and look at the modules subdirectory, you will see ~20 modules, from quite large basic modules like core and imgproc to higher-level specific modules like photo and videostab.

We are in the process of designing all the necessary web infrastructure, but if you wish, you can already do all the necessary work at you side.

  1. (optional) register at github, fork opencv (master branch). If you do not want to register at github, just use git capabilities to clone the repository.
  2. make the branch with a sensible name, like "color_tracker".
  3. within modules directory create a subdirectory (e.g. color_tracker). the easiest way would be to copy a small module, like photo and rename all the headers.
  4. in modules/color_tracker/include/opencv2/color_tracker/color_tracker.hpp define a separate namespace and your classes, put the internal headers and .cpp files to src directory.
  5. modify modules/color_tracker/CMakeLists.txt - specify name of your module (color_tracker) and its dependencies.
  6. optionally, add some tests, documentation, performance tests etc. (test, doc, pert subdirectories).

run CMake, make everything compile.

That's it,the module is ready. Now everyone can download the .zip file of the color_tracker subdirectory, place it to modules and get your functionality. For now we can just publish the list of 3rd-party modules at code.opencv.org wiki, but later we plan to have something like Perl's CPAN or Python's PIP system.

Then, if your module appears to be useful, after some staging period (like 1+ year) we can put it into the core OpenCV, if you or someone else agrees to support it further for several years.

[end of e-mail]

please, consider both options and if you choose the second one, let me know, I'll close this pull request and will gladly add the link to your module at code.opencv.org wiki.

nevion commented 11 years ago

Hi Vadim,

I will get this to work in opencv without the types_c.h modification - I wanted this module in mainline so cvblobslib et-al doesn't happen again and all those email/stackoverflow posts to stop.

I did consider it a long shot to get the type decls in... however given joshdoe's comment I figured it would perhaps come to fruition and I guess I wanted to at least get it brought up. There's been a lot of changes in OpenCV in the last 2 years and some changes have been fairly radical. The new project layout, alot more C++ than C, and switch to Qt are the things off the top of my mind but I guess also use of Cuda and OpenCL... so from my POV I figured it was possible you guys would do introduce something like those types. . But it's rejected, and that's OK. For the record I do think it is less error prone (in terms of range considerations) to use bit-width encoded types and there is an inconsistency in using the opencv conventions with the CV_(float|S|U) types... I apologize again if you've heard this all before again and again but it is weird in a library like opencv to do this IMO. And that's about all I can say on it so I'm done on that topic.

My one question to you regarding android is can you still fling me a patch to fix that build error? I wasn't able to figure it out from the error message. This is very close to being ready.

vpisarev commented 11 years ago

Jason, thanks for understanding and for the prompt reaction. I will try to fix the warnings.

vpisarev commented 11 years ago

Jason, here is the patch that fixes build problems with VS and probably Android; let's see if it works; For some reason, I could not submit a pull request to your repository, so you need to apply the patch manually:

https://github.com/vpisarev/opencv/commit/1eae455acb9e3275657bd6ec8e36a92c30e2ccbd

basically, you can merge https://github.com/vpisarev/opencv.git, branch cc, to your master.

nevion commented 11 years ago

Vadim - I added your patch. Still build errors on android... something the compiler is having trouble with on Input/OutputArrays. On mac the regression test also fails with invalid test data... this only happens when the input matrix (loaded from a png file) is empty... ... perhaps the dataset repo isn't up to date on the mac test environment? This is just a warning however.

Btw can you take references of Input/OutputArrays? I noted you changed the CCStatsOp fields to be pointers but thought it was funny you didn't use references instead...

vpisarev commented 11 years ago

Hi Jason,

1) regarding the tests - can you, please, send me the two files you use for the regression tests? I will put them to opencv_extra. Without the test data the tests will always pass (unless the test will crash somewhere inside the function). 2) on the android build - I'm still trying to figure out why it fails. 3) on Input/OutputArray. using references as members of C++ classes is bad, because it issues the correct warning that the assignment operator and copy constructor can not be generated. Instead of suppressing the warning or ignoring it I always choose to fix it properly.

vpisarev commented 11 years ago

ok; I fixed the android build, please check the cc branch. Basically, you can just copy the implementation of the external cv::connectedComponents[WithStats]() functions to your .cpp file.

The test data is still needed; with failing tests we can not accept the code

nevion commented 11 years ago

Vadim - the test data is here (pushed a week ago): https://github.com/nevion/opencv_extra

I pushed your patch too.

vpisarev commented 11 years ago

ok, I added test data to our copy of opencv_extra; hopefully, now the tests will pass

nevion commented 11 years ago

Vadim - not sure if it's been rebuilt since you added the test data. Can you force a rebuild or should I make a dummy commit?

vpisarev commented 11 years ago

I can and I did. Looks like, this is the problem of our buildbot. Still, we have to wait a bit until it's solved.

vpisarev commented 11 years ago

Hi Jason!

2 more things: 1) can you, please, update the documentation to reflect the API? E.g. the structure description should be removed.

2) we will integrate your pull request early next week, because this week we prepare 2.4.3.x release, and so we want to minimize the possibility of conflicts during after-release 2.4 -> master merges. is that fine with you?

nevion commented 11 years ago

I'll get on 1 sometime today, re 2, we're tantalizingly close and the core code is stable and fairly well tested... I'd hope we could get it in in the next release which would allow such things in, whenever that is. I also hope we're not missing one though that seems like a bug fix release. Btw, come January I'm going to be on a long set of work related field tests so it works best for me to finish up before that... or after though I'd have to fight the distraction.

vpisarev commented 11 years ago

Jason, don't worry, as long as it's integrated (and it's only the documentation that remains to be fixed), we will put it into the nearest OpenCV release.

Putting it to 2.4.3.x is too late; it's very long release process, going through multiple QA steps, it's already on the way. So the earliest release when we can put it in is 2.4.4, which is scheduled for 2013 Feb.

Regards, Vadim

kirill-korniakov commented 11 years ago

Vadim, actually the earliest possible release is 2.5, since this is a new functionality and it is targeted for "master" branch. But I don't think that it will cause any issues, since after merge to master this will be a part of OpenCV!

vpisarev commented 11 years ago

oh, that's right; the pull request should actually be retargeted for 2.4 branch in order to be put into 2.4.4

kirill-korniakov commented 11 years ago

But I thought that 2.4 should accept only bug-fixes and optimizations =) Is it a "must have" for 2.4.4?

vpisarev commented 11 years ago

there are "should" and "could". 2.4.4 can accept anything that does not break 2.4 API :) So, we can potentially add this functionality, no probs

nevion commented 11 years ago

Ok I've pushed an update to the docs. Re 2.4, should I rebase off 2.4 and submit a pull request to that branch?

vpisarev commented 11 years ago

Hi Jason,

Let's put it to master to keep things going. We can still then to move it to 2.4

nevion commented 11 years ago

Looks like the android build bot is freaking out again - other than that I see no warnings relevant to connected components, just histogram equalization and a cascade classifier, not sure what to interpret from the docs, but the topic of interest isn't in it's logs.

vpisarev commented 11 years ago

:+1:

vpisarev commented 11 years ago

:shipit: