QR Code detector fails for integer messages longer than 3 digits

simon-staal commented 1 year ago

There appears to be a bug in the QR-code detection which causes it to fail when a numeric value with 4 or more digits is encoded. The QR codes are successfully decoded when using openCV's decoder implementation. This was discovered when benchmarking the performance of QR code detection. I'm assuming this is a problem with BoofCV, but opening an issue here just in case.

Sample code

import pyboof as pb
import segno
from time import time
from random import randint
# uninstall Java JDK after testing
TEST_FILE = 'qr_test.png'
ATTEMPTS = 5000

qr_pyboof = pb.FactoryFiducial(np.uint8).qrcode()
total_pyboof = 0

for i in range(1000, ATTEMPTS): # Only testing values > 999
    x = i
    qrcode = segno.make(x, version=1)
    assert qrcode.error == 'H'
    qrcode.save(TEST_FILE, scale = 5)

    img = cv2.imread(TEST_FILE)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = cv2.resize(img, (88, 88))

    start = time()
    image = pb.ndarray_to_boof(img)
    qr_pyboof.detect(image)
    end = time()
    assert len(qr_pyboof.detections) == 0, f"Successfully detected {len(qr_pyboof.detections)} qr codes (input = {x})"
    total_pyboof += (end - start) * 1000

print(f'Took an average of {(total_pyboof / ATTEMPTS):.2f}ms for {ATTEMPTS} attempts')

lessthanoptimal commented 1 year ago

Hi Simon,

So far I've NOT been able to reproduce this issue and it seems to run just fine. Could you share an image that you have problems with? Here's my dump from venv "pip freeze > requirements.txt"

numpy==1.24.3 opencv-python==4.7.0.72 py4j==0.10.9.7 PyBoof==0.41 segno==1.5.2 six==1.16.0 transforms3d==0.4.1

I did have to modify the example above because it was missing imports and wouldn't run:

import pyboof as pb import segno import numpy as np << new import cv2 << new from time import time from random import randint

FYI on my system I get 7.41 ms

Cheers,

Peter

On Wed, May 31, 2023 at 10:00 AM Simon Staal @.***> wrote:

There appears to be a bug in the QR-code detection which causes it to fail when a numeric value with 4 or more digits is encoded. The QR codes are successfully decoded when using openCV's decoder implementation. This was discovered when benchmarking the performance of QR code detection. I'm assuming this is a problem with BoofCV, but opening an issue here just in case.

Sample code

import pyboof as pbimport segnofrom time import timefrom random import randint# uninstall Java JDK after testingTEST_FILE = 'qr_test.png'ATTEMPTS = 5000 qr_pyboof = pb.FactoryFiducial(np.uint8).qrcode()total_pyboof = 0 for i in range(1000, ATTEMPTS): # Only testing values > 999 x = i qrcode = segno.make(x, version=1) assert qrcode.error == 'H' qrcode.save(TEST_FILE, scale = 5)
img = cv2.imread(TEST_FILE)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.resize(img, (88, 88))

start = time()
image = pb.ndarray_to_boof(img)
qr_pyboof.detect(image)
end = time()
assert len(qr_pyboof.detections) == 0, f"Successfully detected {len(qr_pyboof.detections)} qr codes (input = {x})"
total_pyboof += (end - start) * 1000
print(f'Took an average of {(total_pyboof / ATTEMPTS):.2f}ms for {ATTEMPTS} attempts')

— Reply to this email directly, view it on GitHub https://github.com/lessthanoptimal/PyBoof/issues/23, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFUOV5FX4K3ZQC6WJQVUCLXI52KRANCNFSM6AAAAAAYVYAQTU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- "Now, now my good man, this is no time for making enemies." — Voltaire (1694-1778), on his deathbed in response to a priest asking that he renounce Satan.

lessthanoptimal commented 1 year ago

Realized that the code you posted only runs if the detector fails. Knowing that I ran the QR through a diagnostic application. it's failing because it claims padding bytes are malformed.

Now the issue is trying to figure out if the QR is really malformed or not. If it is malformed, should that just be ignored since most [1] other detectors ignore that problem? QR codes are a bit chaotic since the standard is treated as a suggestion that's often ignored.

[1] Sample size of 3. I tried a few readers I had on my phone.

On Wed, May 31, 2023 at 2:27 PM Peter A @.***> wrote:

Hi Simon,

So far I've NOT been able to reproduce this issue and it seems to run just fine. Could you share an image that you have problems with? Here's my dump from venv "pip freeze > requirements.txt"

numpy==1.24.3 opencv-python==4.7.0.72 py4j==0.10.9.7 PyBoof==0.41 segno==1.5.2 six==1.16.0 transforms3d==0.4.1

I did have to modify the example above because it was missing imports and wouldn't run:

import pyboof as pb import segno import numpy as np << new import cv2 << new from time import time from random import randint

FYI on my system I get 7.41 ms

Cheers,

Peter

On Wed, May 31, 2023 at 10:00 AM Simon Staal @.***> wrote:
There appears to be a bug in the QR-code detection which causes it to fail when a numeric value with 4 or more digits is encoded. The QR codes are successfully decoded when using openCV's decoder implementation. This was discovered when benchmarking the performance of QR code detection. I'm assuming this is a problem with BoofCV, but opening an issue here just in case.

Sample code

import pyboof as pbimport segnofrom time import timefrom random import randint# uninstall Java JDK after testingTEST_FILE = 'qr_test.png'ATTEMPTS = 5000 qr_pyboof = pb.FactoryFiducial(np.uint8).qrcode()total_pyboof = 0 for i in range(1000, ATTEMPTS): # Only testing values > 999 x = i qrcode = segno.make(x, version=1) assert qrcode.error == 'H' qrcode.save(TEST_FILE, scale = 5)
img = cv2.imread(TEST_FILE)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.resize(img, (88, 88))

start = time()
image = pb.ndarray_to_boof(img)
qr_pyboof.detect(image)
end = time()
assert len(qr_pyboof.detections) == 0, f"Successfully detected {len(qr_pyboof.detections)} qr codes (input = {x})"
total_pyboof += (end - start) * 1000
print(f'Took an average of {(total_pyboof / ATTEMPTS):.2f}ms for {ATTEMPTS} attempts')

— Reply to this email directly, view it on GitHub https://github.com/lessthanoptimal/PyBoof/issues/23, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFUOV5FX4K3ZQC6WJQVUCLXI52KRANCNFSM6AAAAAAYVYAQTU . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- "Now, now my good man, this is no time for making enemies." — Voltaire (1694-1778), on his deathbed in response to a priest asking that he renounce Satan.

-- "Now, now my good man, this is no time for making enemies." — Voltaire (1694-1778), on his deathbed in response to a priest asking that he renounce Satan.

simon-staal commented 1 year ago

Sorry about the missing imports (and the lack of clarity for the code snippet above). I've tested the QR codes in the [1000-9999] range (generated in the same method as above) with the following libraries. For the most part they seem to work:

OpenCV (cv2.QRCodeDetector()): all work with the exception of 9133 (some larger 5 digit values also fail)
pyzbar (pyzbar.decode()): all work
pyzxing (pyzxing.BarCodeReader()): doesn't work for 2189, 4705, perhaps more (haven't been able to test exhaustively because it's quite slow)

In terms of environment, I've been testing this on Windows 10, WSL2 (Ubuntu 22) and a Raspberry Pi 3B+. Here's my venv dump for windows / WSL2:

joblib==1.2.0
numpy==1.24.3
opencv-python==4.7.0.72
py4j==0.10.9.7
PyBoof==0.41
pyzbar==0.1.9
pyzxing==1.0.2
segno==1.5.2
six==1.16.0
transforms3d==0.4.1

And for the pi:


joblib==1.2.0
numpy==1.24.3
opencv-python==4.6.0.66 // Different version of OpenCV
py4j==0.10.9.7
PyBoof==0.41
pyzbar==0.1.9
pyzxing==1.0.2
segno==1.5.2
six==1.16.0
transforms3d==0.4.1

lessthanoptimal commented 1 year ago

As a follow up to my last message. I'm fairly confident that this is a bug in segno. It's incorrectly selecting the first byte to add padding to when the number of bits is a multiple of 8. Since padding bits don't contain the message and bugs are plentiful in encoders most decoders ignore the padding bytes. That's actually the "fix" I'm planning on adding in.

How it was encoded: firstPaddingByte = location/8 + 1

How it should have been encoded firstPaddingByte = location/8 + (location%8 == 0 ? 0: 1)

In this case the message is 32 bytes, which is encoded in 4 bytes. First index of padding should be byte 4 not byte 5, which is how it's encoded.

A bit surprised that it took this long to run into this "problem"

On Wed, May 31, 2023 at 3:15 PM Simon Staal @.***> wrote:

Sorry about the missing imports (and the lack of clarity for the code snippet above). I've tested the QR codes in the [1000-9999] range (generated in the same method as above) with the following libraries. For the most part they seem to work:

OpenCV (cv2.QRCodeDetector()): all work with the exception of 9133 (some larger 5 digit values also fail)

pyzbar (pyzbar.decode()): all work

pyzxing (pyzxing.BarCodeReader()): doesn't work for 2189, 4705, perhaps more (haven't been able to test exhaustively because it's quite slow)

In terms of environment, I've been testing this on Windows 10, WSL2 (Ubuntu 22) and a Raspberry Pi 3B+. Here's my venv dump for windows / WSL2:

joblib==1.2.0 numpy==1.24.3 opencv-python==4.7.0.72 py4j==0.10.9.7 PyBoof==0.41 pyzbar==0.1.9 pyzxing==1.0.2 segno==1.5.2 six==1.16.0 transforms3d==0.4.1

And for the pi:

joblib==1.2.0 numpy==1.24.3 opencv-python==4.6.0.66 // Different version of OpenCV py4j==0.10.9.7 PyBoof==0.41 pyzbar==0.1.9 pyzxing==1.0.2 segno==1.5.2 six==1.16.0 transforms3d==0.4.1

— Reply to this email directly, view it on GitHub https://github.com/lessthanoptimal/PyBoof/issues/23#issuecomment-1571036033, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFUOV7PM5IJIWLXWDIOYK3XI67H7ANCNFSM6AAAAAAYVYAQTU . You are receiving this because you commented.Message ID: @.***>

-- "Now, now my good man, this is no time for making enemies." — Voltaire (1694-1778), on his deathbed in response to a priest asking that he renounce Satan.

simon-staal commented 1 year ago

Thanks for looking into it, I'll open an issue on segno regarding this! The following 5-digit messages also didn't work, are these also due to some kind of padding issue?


10110, 10427, 11207, 11303, 11674, 12389, 14244, 14515

lessthanoptimal commented 1 year ago

I looked at 10110. That's a BoofCV issue. There's a false positive for the finder pattern (square within a square) that's messing everything up. Good test case though!

On Wed, May 31, 2023 at 3:40 PM Simon Staal @.***> wrote:

Thanks for looking into it, I'll open an issue on segno regarding this! The following 5-digit messages also didn't work, are these also due to some kind of padding issue?

10110, 10427, 11207, 11303, 11674, 12389, 14244, 14515

— Reply to this email directly, view it on GitHub https://github.com/lessthanoptimal/PyBoof/issues/23#issuecomment-1571057739, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFUOVZ3ZSKJXVHRDLFGODDXI7CHBANCNFSM6AAAAAAYVYAQTU . You are receiving this because you commented.Message ID: @.***>

-- "Now, now my good man, this is no time for making enemies." — Voltaire (1694-1778), on his deathbed in response to a priest asking that he renounce Satan.

lessthanoptimal commented 1 year ago

Verified that other libraries are completely ignoring padding bytes by removing them from some QR codes and they had no issues reading the mutilated QR. An update will be out soon since there's a NPE that also needs to be fixed.

On Wed, May 31, 2023 at 3:46 PM Peter A @.***> wrote:

I looked at 10110. That's a BoofCV issue. There's a false positive for the finder pattern (square within a square) that's messing everything up. Good test case though!

On Wed, May 31, 2023 at 3:40 PM Simon Staal @.***> wrote:

Thanks for looking into it, I'll open an issue on segno regarding this! The following 5-digit messages also didn't work, are these also due to some kind of padding issue?

10110, 10427, 11207, 11303, 11674, 12389, 14244, 14515

— Reply to this email directly, view it on GitHub https://github.com/lessthanoptimal/PyBoof/issues/23#issuecomment-1571057739, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFUOVZ3ZSKJXVHRDLFGODDXI7CHBANCNFSM6AAAAAAYVYAQTU . You are receiving this because you commented.Message ID: @.***>

-- "Now, now my good man, this is no time for making enemies." — Voltaire (1694-1778), on his deathbed in response to a priest asking that he renounce Satan.

-- "Now, now my good man, this is no time for making enemies." — Voltaire (1694-1778), on his deathbed in response to a priest asking that he renounce Satan.

lessthanoptimal commented 1 year ago

Can you try the new release and see if it fixes the 4-digit issue

simon-staal commented 1 year ago

It no longer breaks for all of them, so I'm assuming the 4-digit padding issue is solved, although it still fails in the following cases (exhaustively tested from [1000,9999]):

2474, 4916, 6079, 7243, 7531, 7763, 8138, 8698, 9158, 9372

I'm assuming these are due to similar issues as those found in the 5-digit numbers?

lessthanoptimal commented 1 year ago

Yeah those aren't going to be fixed right now. I'm going to create a new ticket referring this conversation. What the image above shows is that there's a false positive that's messing it up. In all the cases I checked it should be very clear that this is a bad decision to make.

I'm planning on adding a similar test to what you did to the regression test to ensure that this gets fixed. Thanks for finding this issue!

lessthanoptimal commented 1 year ago

Out of curiously, how does the speed compare to the other libraries? Also are you testing an application where you scan documents and have perfect QR codes?

simon-staal commented 1 year ago

Out of curiously, how does the speed compare to the other libraries? Also are you testing an application where you scan documents and have perfect QR codes?

Here are the benchmark results for detecting a single QR code, which looked very similar to the sample code above. At each iteration, a random QR code was generated, in the range [0-99999] (except for BoofCV where the [0-999] range was used due to the now solved padding issue), and then the time taken to detect the QR code was measured. All tests were run on a Raspberry Pi 3B+ using the Python wrapper of these libraries:	Detector	Mean (ms)
OpenCV	219.298	8.581
BoofCV	155.192	50.998
ZBar	4.274	0.270
ZXing	1953.751	57.966

I was surprised to see BoofCV had such a large variance, not sure if you could provide any insight as to why?

As for the intended use case, it's a bit more complicated than scanning documents; I'm developing an intelligent scrabble board, which contains an embedded camera and raspberry pi inside to detect QR codes on the bottom of the tiles. Obviously this benchmark has the limitation that the QR codes used don't have imperfections from being captured with a camera, but much easier to get a rough idea this way - although if you have any recommendations on how to improve this feel free to let me know! Images are always taken directly below the QR codes, and de-noised + binarized before being passed to the QR code detector anyways.

I'm also running a benchmark now which looks at the performance of these libraries in detecting QR codes for the entire board segment, where I'm testing a couple of different methods (segmenting the image into squares and detecting each one individually vs detecting all squares in the image and determining their position via bounding boxes). If you're interested in those results I'll post them when they're ready 😄.

lessthanoptimal commented 1 year ago

~Which detector in OpenCV are you using? I think there are at least 2 now with drastically different behavior.~

Noticed that you said which one you were using above.

lessthanoptimal commented 1 year ago

As for the performance, I'm a bit surprised by the speed. It's very different from what I saw several years ago in this benchmark. I suspect that when you process larger images the profile will change drastically. With the large standard deviation I would need to look into it. For example, years ago Raspberry PI's official distribution included what was effectively a broken JVM that made all things java run at a glacial speed. I kinda suspect that ZBar is doing so well because it's doing a more naive threshold that's advantageous in this specific use case.

simon-staal commented 1 year ago

Out of interest did I use the "correct" one for this purpose? I also found a cpp implementation of ZXing which also had a python wrapper linked here which I tested on the same initial benchmark above, which actually performed better than any other implementations - see updated table below:

Detector	Mean (ms)	Std. Deviation (ms)
OpenCV	219.298	8.581
BoofCV	155.192	50.998
ZBar	4.274	0.270
ZXing	1953.751	57.966
ZXing-cpp	2.323	0.039

The implementation of the cpp version has since diverged substantially from the original ZXing, so the difference in performance could be due to a different algorithm, but possible helps support your broken JVM hypothesis?

Edit: As for the more advanced benchmark, I'll try to upload the document with the results on this issue in the next week or so

Edit2: As for your benchmark, I read through it before when doing research, and I was also surprised with how difference performance was to what you measured. Something I found particularly weird was when running tests on the full board, comparing an iterative approach in which the image is first segmented into tiles, then the detector is run on each tile individually, vs running the detector once on the whole image and identifying the positions of the tiles based on their bounding boxes - essentially corresponding to your "lots" case. Surprisingly, BoofCV performed really badly with this simultaneous approach - I've attached the plot of the results below (with 95% confidence intervals drawn): BoofCV_performance

lessthanoptimal commented 1 year ago

Hmm one thing that could be done is to run the benchmark I linked to above using an updated version of the libraries. That way we can see if there has been some drastic improvement or not. What tended to kill performance for many scenes with complex background. You can see that when you look at the individual categories.

BTW it's possible to "tune" BoofCV to just use a global threshold and you will get large speed gain, but it will be much less robust when you encounter complex lighting situations.

lessthanoptimal commented 1 year ago

The opencv + boofcv comparison talks about the broken JVM issue briefly. https://boofcv.org/index.php?title=Performance:OpenCV:BoofCV

lessthanoptimal commented 1 year ago

essentially corresponding to your "lots" case. Surprisingly, BoofCV performed really badly with this

Can you post an example image from that test? It does seem like something weird is going on. Almost like it's hitting a memory limit. The other library developers might have seen that benchmark and improved their code in the past 4 years since I ran the test, I think a bunch of these libraries are effectively in maintenance mode with only minor tweaks.

simon-staal commented 1 year ago

Sure, here's an example containing images for some of the different tile percentages tested (note the QR codes are still 88 by 88, generated through the same method as above, then stacked together to form the board)

lessthanoptimal commented 1 year ago

Got a link to a full resolution image?

simon-staal commented 1 year ago

Here's one:

bad_board

I can also upload the code to generate these boards if that would be helpful?

lessthanoptimal / PyBoof

QR Code detector fails for integer messages longer than 3 digits #23