micjahn / ZXing.Net

.Net port of the original java-based barcode reader and generator library zxing
Apache License 2.0
2.73k stars 665 forks source link

DataMatrix inserts wrong character for FNC1 #41

Closed Adam-Olmec closed 7 years ago

Adam-Olmec commented 7 years ago

When encoding a DataMatrix the code substitutes ASCII character 29 (\) with the FNC1 character where FNC1 is defined as (char)232. Code here

I believe this is a mistake. ISO/IEC 16022:2006 Section 5.2.4.6 "FNC1 alternate data type identifier" (emphasis mine)

To encode data to conform to specific industry standards as authorised by AIM Inc., a FNC1 character shall appear in the first or second symbol character position (or in the fifth or sixth data positions of the first symbol of Structured Append). FNC1 encoded in any other position is used as a field separator and shall be transmitted as GS control character (ASCII value 29).

Essentially the FNC1 character used at the start of the DataMatrix and during the data are different, and the wrong one is being inserted.

Despite this, a DataMatrix produced using an FNC1 representation of 232 instead of 29 to delimit variable data still passes verification, but the output is not the same as other DataMatrix encoders I've seen.

I believe it can be fixed by replacing this else:

else
{
   if (c == 29)
   {
      context.writeCodeword((char)HighLevelEncoder.FNC1);
   }
   else
   {
      context.writeCodeword((char)(c + 1));
   }
   context.Pos++;
}

with this:

else
{
   context.writeCodeword((char)(c + 1));
   context.Pos++;
}

I suspected that the lookahead methods may need modifying to accept the GS character (\u001D). For example adding to the check isNativeText(char ch) in HighLevelEncoder.cs but I'm not sure about this.

internal static bool isNativeText(char ch)
{
   return (ch == ' ') || (ch >= '0' && ch <= '9') || (ch >= 'a' && ch <= 'z') || ch == '\u001D';
}
micjahn commented 7 years ago

I'm not sure if it's true or, at least, it is commonly implemented/interpreted in that way. I looked into the GS1 DataMatrix Guideline (https://www.gs1.org/docs/barcodes/GS1_DataMatrix_Guideline.pdf) which says at page 15 that the codeword 232 is used as start character and as field separator. The GS1 General Specification (https://www.gs1.org/sites/default/files/docs/barcodes/GS1_General_Specifications.pdf) says at page 292 the following:

Use of FNC1 or the control character (ASCII value 29 (decimal), 1D (hexadecimal)) as a separator character following non-predefined length element strings.

It seems to me that the two symbols are alternatives. You say that the 232-Datamatrix codes still passes verification. So I don't see a reason why it should be changed. But I don't have the Datamatrix specification document to read the full specs.

Adam-Olmec commented 7 years ago

I agree that the GS1 DataMatrix Guideline does make it sound like 232 is the only use of FNC1, but in my opinion the ISO document is more trustworthy. Here are the relevant sections 2017-09-20 10_10_08-16022_2066 pdf - adobe acrobat reader dc 2017-09-20 10_09_43-16022_2066 pdf - adobe acrobat reader dc

I also verified the 2D shown in the GS1 standard on page 221 and can confirm that their examples use 29 and not 232. Verification done using one of these. 2017-09-20 10_13_31-lvs-95xx iso_iec static verifier 4 0 0j operatorid_admin app_gs1 general

I'm personally not comfortable producing a barcode that differs from the standard, from examples by the GS1 standard and from the firmware in commercial printers so I've forked the repo to add the change. I'd much prefer to just use your repo though.

micjahn commented 7 years ago

changed the behaviour so that the codeword 232 is only added the first time when the GS character is found. any further occurence is added as 0x001d.