serratus / quaggaJS

An advanced barcode-scanner written in JavaScript
https://serratus.github.io/quaggaJS/
MIT License
5.07k stars 979 forks source link

Limiting false positives #237

Open iFlash opened 7 years ago

iFlash commented 7 years ago

During excessive testing, I found that I get about 5 to 10 percent false positives. But I found out – and I do not think that this is documented - if you check the error of the decodedCodes, you will find that by rejection codes with a certain error margin will increase your hit rate 100 percent.

This is the code I use. I take the average error margin. If it is below 0.1 it is fair to assume we detected the code correct.

So far I had no false positives at all while still a very fast detection.

var countDecodedCodes=0, err=0;
$.each(result.codeResult.decodedCodes, function(id,error){
    if (error.error!=undefined) {
        countDecodedCodes++;
        err+=parseFloat(error.error);
    }
});
if (err/countDecodedCodes < 0.1) {
    // correct code detected
} else {
    // probably wrong code
}

Hope this helps!

braindigitalis commented 6 years ago

Hi,

I too am struggling with error rates. I have tried your code and unfortunately still get errors, the longer the code the more likely there will be an error.

For example scanning the attached code for me sometimes scans correctly, at other times returns garbage within the string such as "6e&n Connery":

sean-connery-barcode

My own personal sanity check on the bar codes is to dictate a valid format of the returned string, e.g. reject any results that contain non-alphanumeric such as &, #, @, etc. This may work for you, but if you actually want to accept strings containing these characters this workaround is not a valid solution.

I've combined your solution with my own, which has reduced my error rate to zero in my use case:

         Quagga.onDetected(function(result) {
                var code = result.codeResult.code;

                if (App.lastResult !== code) {
                        App.lastResult = code;

                        var countDecodedCodes=0, err=0;
                        $.each(result.codeResult.decodedCodes, function(id,error) {
                                if (error.error!=undefined) {
                                        countDecodedCodes++;
                                        err += parseFloat(error.error);
                                }
                        });
                        if (err / countDecodedCodes < 0.1 && sanityCheck(code)) {
                                Quagga.stop();
                                $("#scanModal").modal("hide");
                                $(linked_input).val(code);
                                border_pulse(linked_input);
                        }
                }
        });
        function sanityCheck(s) {
                return s.toUpperCase().match(/^[0-9A-Z\s\-\.\/]+$/);
        }

Hope this helps!

iFlash commented 6 years ago

My routine was done for UPC and EAN-13 codes. It might not work with other codes as nicely.

You should inspect the object codeResult.decodedCodes closely, especially its error field. Have them logged in the console to see the range and adjust your threshold accordingly.

OR: Change the line

if (err/countDecodedCodes < 0.1)

To a lower value like 0.08. The lower the value, the higher the chance the code was interpreted correctly. But also it might take longer to correctly identify the code.

agusdutra commented 6 years ago

@serratus Could we add these feature to the source adding a property to the reader say errorLimit ?: number and check this error limit before publishing to onDetected .

It's a nice validation since a lot of codes have a error > 0.1 and are false-positives.

I'm not quite sure where it would be right to add this validation, but would be eager to do it if I could get some guide.

sam-lex commented 6 years ago

Based on @iFlash answer, I made it using median instead of averages.

private _getMedian(arr: number[]): number {
  arr.sort((a, b) => a - b);
  const half = Math.floor( arr.length / 2 );
  if (arr.length % 2 === 1) // Odd length
    return arr[ half ];
  return (arr[half - 1] + arr[half]) / 2.0;
}

// Initializers
private _initDetectedHandler() {
  this.onDetectedHandler = (result) => {
    const errors: number[] = result.codeResult.decodedCodes
      .filter(_ => _.error !== undefined)
      .map(_ => _.error);
    const median = this._getMedian( errors );
    if (median < 0.10)
      // probably correct
    else
      // probably wrong
  };

  Quagga.onDetected( this.onDetectedHandler );
}

During my tests (built-in webcam), I noticed that many times it reads correctly, but its averages were all above 0.1 because some of the errors has a much higher value like 0.3 ..0.4 while others are 0.05. .0.07, thus "pulling" the average up.

Medians represents most of the dataset a bit better. That being said, I still get false positives occasionally. Hit rate of probably 7-8/10, but a faster match than averages.

fffx commented 4 years ago

If you using this for ISBN, you can make use of check digit

ericblade commented 4 years ago

fwiw, check digits alone don't seem to work well in practice. At least with UPC, it seems quite possible to get a significantly erroneous read, and still have the check digit come out matching the erroneous read. As well, I've run into quite a few UPCs that are stamped on actual product packages, but don't pass check digit validation. Mostly older stuff, though, it's probably improved quite a bit in the last several years on newer items.

I've validated this with both my own validation library, as well as other online validation resources, just to make sure that my library worked correctly.

So, a strategy that I'm wanting to put together soon is something along the lines of "if check digit validates and error rate < 50%, -or- if error rate < 20%" or something like that. tweak the numbers some to see what works.. but allow both to pass through to my app.

fffx commented 4 years ago

@ericblade Thanks, I realized the barcode is too small(ISBN), then I set a [zoom value] (https://github.com/serratus/quaggaJS/issues/307), the accuracy has boosted dramatically.

ericblade commented 4 years ago

Yeah, that is also something that I think I'd like to start playing with, I just noticed it my last run through of the source code (it's amazing how many times you can go through all this stuff, and not notice certain things), and it does seem like something that could be useful.

reon777 commented 4 years ago

Its good for me.

let codes = []
function _onDetected(result) {
    codes.push(result.codeResult.code)
    if (codes.length < 3) return
    let is_same_all = false;
    if (codes.every(v => v === codes[0])) {
        is_same_all = true;
    }
    if (!is_same_all) {
        codes.shift()
        return
    }
}
julienboulay commented 3 years ago

@ericblade Base on @sam-lex answer, I get good results (near 100%) validating errors against two threshold : median et max values

function isValid(result) {
const errors: number[] = result.codeResult.decodedCodes
   .filter(_ => _.error !== undefined)
   .map(_ => _.error);

const median = this._getMedian(errors);

//Good result for code_128 : median <= 0.08 and maxError < 0.1
return !(median > 0.08 || errors.some(err => err > 0.1))
}
ericblade commented 3 years ago

0.08 :O that seems very low. out of curiosity, do you have any idea how long it takes you to get a result with that, normally?

that's usually part of the tradeoffs -- taking long to get a good result, and the battery lifetime involved in doing so

ericblade commented 3 years ago

I am now playing with something resembling this

function getMedian(arr) {
    const sorted = [...arr].sort((a, b) => a - b);
    const half = Math.floor(sorted.length / 2);
    if (arr.length % 2 === 1) {
        return arr[half];
    }
    return (arr[half - 1] + arr[half]) / 2;
}

function getMedianOfCodeErrors(decodedCodes) {
    const errors = decodedCodes.filter((x) => x.error !== undefined).map((y) => y.error); // TODO: use reduce
    const median = getMedian(errors);
    return { probablyValid: !(median > 0.10 || errors.some((err) => err > 0.1)), median };
}

...
        const err = getMedianOfCodeErrors(result.codeResult.decodedCodes);
        const validated = barcodeValidator(result.codeResult.code);
        console.warn('* errorCheck', result.codeResult.code, err, validated);
        if (err.probablyValid || (err.median < 0.25 && validated.valid === true && validated.type === 'upc')) {
            onDetected(result);
        }

barcodeValidator is from https://github.com/ericblade/barcode-validator

primeKal commented 3 years ago

you can try this make an array and select the highest mode value

 function mode(array){
            if(array.length == 0)
                return null;
            var modeMap = {};
            var maxEl = array[0], maxCount = 1;
            for(var i = 0; i < array.length; i++)
            {
                var el = array[i];
                if(modeMap[el] == null)
                    modeMap[el] = 1;
                else
                    modeMap[el]++;
                if(modeMap[el] > maxCount)
                {
                    maxEl = el;
                    maxCount = modeMap[el];
                }
            }
            return maxEl;
        }

  var last_result=[];

            Quagga.onDetected(function (result) {
                var last_code = result.codeResult.code;
                last_result.push(last_code);
                if (last_result.length >20){
                console.log(last_result);
                //when we reached the last scanned object take the most repeated is the correct one
                code = mode(last_result);
                console.log(code +" Is the most valid one");
                  }
            });
dansleboby commented 2 years ago

Great list of REGEX depending of barcode type: https://www.neodynamic.com/Products/Help/BarcodeWinControl2.5/working_barcode_symbologies.htm

ericblade commented 2 years ago

Great list of REGEX depending of barcode type: https://www.neodynamic.com/Products/Help/BarcodeWinControl2.5/working_barcode_symbologies.htm

Perhaps useful, but it's worth noting that that regex comparisons only tell you that it might be valid, not that it is valid. I would doubt that a decoder is going to return something that wouldn't pass a regex, but it can easily due to read errors, return something that doesn't actually pass a checksum test.