jodal / biip

📦 Biip interprets the data in barcodes.
https://biip.readthedocs.io
Apache License 2.0
44 stars 3 forks source link

Compressed UPC-E are not supported #77

Closed hurlenko closed 3 years ago

hurlenko commented 3 years ago

Hi @jodal,

First of all thanks for your awesome library. I would like to know if you have plans to support compressed UPC-E barcodes. As far as I know compressed version is not valid GTIN.

I might be wrong but it seems like the algorithm to calculate the check digit is either different than the one in GTIN or there's no such algorithm at all. When uncompressed UPC-E becomes valid GTIN-12 (it becomes a UPC-A barcode). But when compressed it is not valid GTIN-8, it must be expanded to become GTIN-12 (though it might look like it is valid, as with this example 02345673 which can be treated as GTIN-8).

You can check here for GTIN compatibility https://www.barcodefaq.com/1d/upc-ean/#GTIN_Compliance. Note that in the given example they expanded 02349036 to 00023400000900 which i believe is incorrect because the check digit must be 6.

Here's a sample code to expand UPC-E to UPC-A - https://code.activestate.com/recipes/528911-barcodes-convert-upc-e-to-upc-a. I tested i on 02349036 and it gives 023400000906 which is a valid UPC-A/GTIN-12. And here's how to compress UPC-A back to UPC-E - https://gist.github.com/corpit/8204456. Also works as expected.

I think the biggest problem is to distinguish EAN-8 from UPC-E to know when to make the decompression.

jodal commented 3 years ago

I'm interested in supporting this use case. As I'm based in Europe, I'm not too exposed to UPC barcodes, but I've seen them on a few products here as well.

Is there such a thing as "regular" UPC-E barcodes, or are all UPC-E barcodes "compressed"?

My initial idea for how to robustly differentiate EAN-8 from UPC-E is by enabling Symbology Identifiers on the barcode scanner. EAN-8 seems to use the Symbology Identifier ]E4, while I believe UPC-E might use the same Symbology Identifier as EAN-13 and UPC-A, ]E0? If you could provide some photos of UPC-A and UPC-E barcodes, it would be handy for testing this.

In the cases where you cannot enable Symbology Identifiers for some reason, a possible strategy is to specify what order the parsers should be tried when calling biip.parse(). That way, Europeans could prioritize EAN-8 while Americans could prioritize UPC-E.

jodal commented 3 years ago

Aside, it would be awesome if you could add yourself to https://github.com/jodal/biip/wiki/Users if you're using Biip in production :-)

hurlenko commented 3 years ago

Is there such a thing as "regular" UPC-E barcodes, or are all UPC-E barcodes "compressed"?

To be honest I have no idea 😅. My final goal is to store a product in a database using a barcode and have barcode to uniquely identify that product. For that I need all of the barcodes to be encoded consistently to GTIN-14. I realized that allowing both compressed and uncompressed versions of UPC-E may result in duplicates of the same product. After googling for some time that was all the information that I found.

I'm based in Europe, too so I think the only option is to google for sample barcodes 🙂.

As for the Symbology Identifiers and parsing priority - seems like those are the only options. According to this and this there's really no way to reliably differentiate EAN-8 from UPC-E. When scanning the barcode however, you do know the type of the barcode so you can convert UPC-E to UPC-A before any further processing.

So having the UPC-E <-> UPC-A conversion in the library would be nice but overall feel free to close the issue.

Aside, it would be awesome if you could add yourself to https://github.com/jodal/biip/wiki/Users if you're using Biip in production :-)

I will let you know once I have a working solution 🙂. Thanks again!

jodal commented 3 years ago

New UPC support

I've made a PR (#78) that adds support for UPC, including:

Symbology Identifiers on UPC-E vs EAN-8 barcodes

I've tested scanning the barcodes at https://www.barcodefaq.com/1d/upc-ean/#GTIN_Compliance with Symbology Identifiers enabled giving the following results:

If your scanner behaves in the same way, EAN-8 and UPC-E mixups should not be a problem.

Parsing of UPC-E vs GTIN-8

If you have another source of UPC-Es than a physical barcode scanner, like a product catalog, that you want to have converted to GTIN-14 for storage, I think the PR in its current state should work for you...

When an UPC-E is parsed the upc field in the parse result is set. It is then expanded to UPC-A and also populates the gtin field as a GTIN-12:

>>> biip.parse('02349036')
ParseResult(
  value='02349036',
  symbology_identifier=None,
  gtin=Gtin(value='023400000906', format=GtinFormat.GTIN_12, prefix=GS1Prefix(value='002', usage='GS1 US'), payload='02340000090', check_digit=6, packaging_level=None),
  gtin_error=None,
  upc=Upc(value='02349036', format=UpcFormat.UPC_E, number_system_digit=0, payload='0234903', check_digit=6), 
  upc_error=None,
  sscc=None,
  sscc_error="Failed to parse '02349036' as SSCC: Expected 18 digits, got 8.",
  gs1_message=None,
  gs1_message_error="Failed to match '02349036' with GS1 AI (02) pattern '^02(\\d{14})$'."
)

As these are two representations of the same thing, their GTIN-14 representation is identical:

>>> biip.parse('02349036').gtin.as_gtin_14()
'00023400000906'

>>> biip.parse('02349036').upc.as_gtin_14()
'00023400000906'

When a GTIN-8 is parsed, the gtin field in the parse result is set. In this case, the value is also a valid UPC-E, as can be seen from the upc field. Since the gtin field is already set by the parsing as GTIN-8, the UPC-E is not automatically expanded to UPC-A and converted to GTIN-12.

>>> biip.parse('12345670')
ParseResult(
  value='12345670',
  symbology_identifier=None,
  gtin=Gtin(value='12345670', format=GtinFormat.GTIN_8, prefix=GS1Prefix(value='00001', usage='GS1 US'), payload='1234567', check_digit=0, packaging_level=None),
  gtin_error=None,
  upc=Upc(value='12345670', format=UpcFormat.UPC_E, number_system_digit=1, payload='1234567', check_digit=0),
  upc_error=None,
  sscc=None,
  sscc_error="Failed to parse '12345670' as SSCC: Expected 18 digits, got 8.",
  gs1_message=None,
  gs1_message_error="Failed to parse GS1 AI (12) date from '345670'."
)

In other words, you're able to choose if you prioritize the UPC or the GTIN interpretation, and convert the one you prefer to GTIN-14 for storage:

>>> biip.parse('12345670').gtin.as_gtin_14()
'00000012345670'

>>> biip.parse('12345670').upc.as_gtin_14()
'00123456000070'

Please give the PR a spin and let me know if this covers your use case!

hurlenko commented 3 years ago

Wow, thanks for this awesome update! I've just tried it and it seems to be working perfectly fine, definitely covers my use case. I believe the issue is now resolved. Thanks a lot!

jodal commented 3 years ago

Thanks for testing! The UPC support is now out as part of the 1.1.0 release.