googlesamples / mlkit

A collection of sample apps to demonstrate how to use Google's ML Kit APIs on Android and iOS
Apache License 2.0
3.51k stars 2.92k forks source link

Barcode scanning fails with "Unknown encoding" for ISO-8859-1 encoded data matrix #218

Open dspoeri opened 3 years ago

dspoeri commented 3 years ago

The official German medication plan data matrix ("BMP", Bundeseinheitlicher Medikationsplan) expects data to be encoded with ISO-8859-1. If the data contains a German umlaut, Google Vision barcode scanning fails with an "Unknown encoding" error.

Scanning the following data matrix reproduces the bug: barcode

This bug sadly renders Google Vision barcode scanning useless for the mentioned use case.

Two suggested solutions:

GarryKelly commented 3 years ago

Just saw this and it is similar to an issue reported last year . I commented on that here https://github.com/googlesamples/mlkit/issues/44#issuecomment-632303060 Unfortunately meant the library just didnt work for our use cases.... Its a shame as its an excellent library otherwise....

I agree with the suggested solutions... it would be wonderful for the library to either support the ISO-8859-1 characterset as an option. Or else to provide access to the scanned data as a byte array without going through any character set conversions... Both options would allow reading of all barcodes

I noticed there was some new version com.google.firebase:firebase-ml-vision-barcode-model:16.1.2 released later in 2020 but havent had time to see if these provided that access...

ivan200 commented 3 years ago

At least com.google.mlkit:barcode-scanning:16.1.0 contains barcode.rawBytes

Returns raw bytes as it was encoded in the barcode. Returns null if the raw bytes can not be determined.

so I think you can make return String(barcode.rawBytes, StandardCharsets.ISO_8859_1)

dspoeri commented 3 years ago

At least com.google.mlkit:barcode-scanning:16.1.0 contains barcode.rawBytes

Returns raw bytes as it was encoded in the barcode. Returns null if the raw bytes can not be determined.

so I think you can make return String(barcode.rawBytes, StandardCharsets.ISO_8859_1)

It doesn't help: rawBytes returns an array with 16 bytes representing the string Unknown encoding.

cs-googler commented 3 years ago

Hi, we are working on a fix internally.

pke commented 1 year ago

so @cs-googler how is the internal fix going? How about letting the user specify the encoding via BarcodeScannerOptions?