zxing / zxing

ZXing ("Zebra Crossing") barcode scanning library for Java, Android
Apache License 2.0
32.76k stars 9.36k forks source link

DataMatrix C40 mode is exited unexpectedly #630

Closed slepmog closed 8 years ago

slepmog commented 8 years ago

We tried to encode the following sample data string:

JGB<Space>012000001200000042LL139TY2J09TY2J0<Space><Space><Space><Space><Space><Space><Space><Space><Space><Space><Space><Space><Space> (where <space> represents an actual single space character)

to a DataMatrix barcode with the following hints:

DATA_MATRIX_DEFAULT_ENCODATION: Encodation.C40
MIN_SIZE: [24x24]
MAX_SIZE: [24x24]

Resulting barcode: blob

It reads no problem, but our partner's system does not accept it, claiming that in the raw barcode data, right after JCB, there is a service character code 254 (FE) that ends the C40 encoding mode, and C40 mode is only reentered after LL13, whereas their system has a strict requirement for the C40 mode to be used throughout the entire barcode. The 24x24 size is also a strict requirement of theirs.

Indeed, raw barcode data, as reported by the ZXing online decoder, seems to match what they are saying:

e6 92 f0 fe 21 83 96 82   82 8e 82 82 82 ac 4d 4d
8f e6 56 8f 29 1d 13 3c   13 3c 13 3c 13 3c fe 21
81 88 1f b5 

e6 at offset 0 starts C40, then there is fe at offset 3, then there is another e6 at offset 17.

We believe this is not expected behaviour, or is it?

srowen commented 8 years ago

I'm not an expert on Data Matrix enough to say for sure, but, I don't think a particular mode is mandatory throughout the barcode. It can legally latch to whatever mode it likes. Some choices are more efficient than others, but many are legal. You could hack the code to never choose anything but C40 but I don't think that's somehow required by the format.

slepmog commented 8 years ago

Okay then, if that is the case, the issue turns out to be a feature request rather than a bug report, at which point I will have to look into hacking the code.

Regarding the requirement, it's the UK Royal Mail, they want a MailMark barcode on letters (which is a DataMatrix ECC200 barcode), and they absolutely want C40 throughout. They require applicants to submit sample barcodes and they run them through an automated testing process that results in a pretty impressively looking report that details every possible physical property of the barcode and rates each one. Not having C40 throughout comes out as a stopper issue on this report, so we can't apply with our barcodes, which means mailing will be more expensive for us.

slepmog commented 8 years ago

Apparently this is by design, all DataMatrix encoders peek into the future characters, and if they see that it would be more compact to encode them with a different encoding mode, they request to change themselves to a different encoder before proceeding.

I might have patched this, but I'm not sure if it's worth a pull request. If it is, I'm not sure what would be a better way to do it!

One way is to convert C40Encoder into an abstract class C40EncoderBase with a protected abstract property peekAllowed, and make two derived classes, C40Encoder that overrides the property to true and MailMarkEncoder that overrides it to false. C40EncoderBase would check this property before calling HighLevelEncoder.lookAheadTest. That way we have a full blown new encoder (which is 99.9% C40Encoder), and we'll need to add static final int MAILMARK_ENCODATION = 6; to HighLevelEncoder and otherwise teach it to recognize the new constant.

Another way would be adding the peekAllowed flag to the EncoderContext and teach C40Encoder to respect it. That way we don't create a separate encoder, we introduce a new encoding hint instead. It will be visible to all other encoders, which might be a good thing or not, I don't know enough about this to tell if it's safe for other types of encoders to not request encoding change like it is safe for C40.

Champomix commented 6 years ago

Apparently this is by design, all DataMatrix encoders peek into the future characters, and if they see that it would be more compact to encode them with a different encoding mode, they request to change themselves to a different encoder before proceeding.

I might have patched this, but I'm not sure if it's worth a pull request. If it is, I'm not sure what would be a better way to do it!

One way is to convert C40Encoder into an abstract class C40EncoderBase with a protected abstract property peekAllowed, and make two derived classes, C40Encoder that overrides the property to true and MailMarkEncoder that overrides it to false. C40EncoderBase would check this property before calling HighLevelEncoder.lookAheadTest. That way we have a full blown new encoder (which is 99.9% C40Encoder), and we'll need to add static final int MAILMARK_ENCODATION = 6; to HighLevelEncoder and otherwise teach it to recognize the new constant.

Another way would be adding the peekAllowed flag to the EncoderContext and teach C40Encoder to respect it. That way we don't create a separate encoder, we introduce a new encoding hint instead. It will be visible to all other encoders, which might be a good thing or not, I don't know enough about this to tell if it's safe for other types of encoders to not request encoding change like it is safe for C40.

Do you have an example to share ? I have the same issue, and i don't know how to resolve it ...