marklio / LinqToStdf

A library for parsing/processing Standard Test Datalog Format (STDF) files, typically used in semiconductor testing.
27 stars 21 forks source link

Stdf file writing issue when value in record is null #31

Open WeiqingChen opened 11 months ago

WeiqingChen commented 11 months ago

When I parsed my ate test stdf file and then wrote the new stdf file. The created stdf file content is not correct and the size of new stdf file is not equal to the original file. I found the cause is that some value in file is "00" and it is parsed in StdfRecord is null. But if we used the StdfRecord to write the new stdf file, the writer will ignore the null value and it will not produce "00" in new stdf file, it also length of the field is not correct.

Such as the following ATR content. In std file, the MOD_TIM in stdf file is "00 00 00 00", the CMD_LINE is "00". In the parsed record, they are both "null". Atr_content

The following is ATR part comparison between original stdf file and new stdf file. atr_compare

I have checked the LinqToStdf project code step by step, but have not find the cause in which part code. Could you help me to solved the issue? Thanks!

The stdf file and my console code is attached. Program.zip

20230922FT_P2_2023SEP23092210.zip

marklio commented 11 months ago

It was never a goal for LinqToStdf to round-trip records with a byte-for-byte match. The writer aims to produce standards-compliant STDF files based on the data in the records, not the bytes that produced the records. The STDF spec allows a wide range of latitude on data representation in the area of "missing/invalid" values in order to have higher data density, and LinqToStdf aims to be "optimal" in its use of this latitude to produce the smallest files for a given set of data. Here's what the spec says on this:

Optional fields at the end of a record may be omitted in order to save space on the storage medium. To be omitted, an optional field must have missing or invalid data, and all the fields following it must be optional fields containing missing or invalid data. It is never legal to omit an optional field from the middle of the record.

The writer accomplishes this by writing the records "backwards" so it can easily discover the first valid field in a record.

In the case of this particular ATR, the MOD_TIME has the "missing/invalid date" value of 0, and the CMD_LINE field has zero length, which is the "missing/invalid" indicator (these are listed in the "Optional Fields and Missing/Invalid Data" section of the spec). So, it checks the last field, CMD_LINE, and sees that it has the "missing" value, so it doesn't write it, which makes MOD_TIME the last field, and since it also has the "missing value", it does not write it. So, since the record is made entirely out of missing/invalid data, it is spec-compliant to write the record out with a length of zero instead of 5.

marklio commented 11 months ago

I'm re-opening this based on a re-reading of the spec. The section on options fields states [emphasis mine]:

The specification of each STDF record has a column labelled Missing/Invalid Data Flag. An entry in this column means that the field is optional, and that the value shown is the way to flag the field’s data as missing or invalid. If the column does not have an entry, the field is required

The documentation for ATR does not indicate these fields are "optional", so it isn't valid to skip them on writing. It's been years since I wrote the writing code, so I don't recall whether it doesn't implement this part of the spec, or the ATR fields aren't attributed properly WRT their optional-ness. In any case, it's likely a while before I can fix this. Someone else may be able to.

marklio commented 11 months ago

A quick reading of https://github.com/marklio/LinqToStdf/blob/master/Main/LinqToStdf/RecordConverting/UnconverterEmittingVisitor.cs indicates that the writer incorrectly assumes that any field with the "missing/invalid" value is optional and doesn't pay attention to the IsOptional property. There is a hilarious TODO on line 132:

//TODO: do the right kind of checks for the optional node properties

And at line 225 we decide what to do in case we don't have a value and you can see we don't pay attention to whether the property is ACTUALLY optional.

You can also make a case that the definition of Atr is incorrect. My reading is that a valid date is required, so it doesn't make sense that ModificationTime is nullable. (I'd also argue that the Atr in the file is bogus, but that's not terribly important)

I don't think the fix is very difficult, but I don't work on this codebase a ton, so it will take me a while to get to it. Feel free to submit a PR and I'm happy to review and merge it.