Closed ghost closed 13 years ago
Fixed.
New internal tracking collection, skippedFieldPositionSet, was added to LogParser. As the field names from the #Fields directive are parsed out, every field name that reports an unknown (null) field index has it's position in the field list remembered in this set.
Later, inside of parseLogEntry, as the values from the log entry line are being parsed, when we reach a value that belongs to the same position of a field name we ignored, we skip the value completely.
Now the skipped headers and skipped values stay in sync.
To make CloudFront Log Parser as flexible as possible, it was engineered to skip Field names (From the #Fields directive) that were unknown.
Amazon's "access log format" documentation page alone has 4 different mismatched log examples to log entry tables so this seemed to be an important current feature and one for future (safe) growth instead of the parser just exploding when it hit an unknown value.
Unfortunately, the skipped field names are not kept, so when it comes time to parse the individual log lines, we don't know which values to skip that were associated with the skipped fields.
This means we skip the field names, but when we get to parsing the values, we will mis-assign the wrong values to the wrong fields as a result.