veraPDF / veraPDF-library

Industry supported, open source PDF/A validation library
http://verapdf.org/software
GNU General Public License v3.0
270 stars 48 forks source link

Arlington checker fails due to incomplete Nums array #1407

Open asciim0 opened 7 months ago

asciim0 commented 7 months ago

The file that caused the error is publically available here.

Running the Arlington PDF Model file checker in version 1.26.0-RC1 fails with the error "Could not complete validation due to an error" in the UI while throwing a java error including "Cannot invoke "org.verapdf.pd.structure.NumberTreeIterator.hasNext()" because "this.innerCurrentIterator" is null". Looking at the file itself, the /Nums array is indeed incomplete:

obj 1130 0 R << /Limits [ 0 null ] /Nums [ 0 1171 0 R 1 1172 0 R 2 1173 0 R 3 1174 0 R 4 1175 0 R 5 1176 0 R 6 1177 0 R 7 1178 0 R 8 1179 0 R 9 1180 0 R 10 1181 0 R 11 1182 0 R 12 1183 0 R 13 1184 0 R 14 1185 0 R 15 1186 0 R 16 1187 0 R 17 1188 0 R 18 1189 0 R 19 1190 0 R 20 1191 0 R 21 1192 0 R 22 1193 0 R 23 1194 0 R 24 1195 0 R 25 1196 0 R 26 1197 0 R 27 1198 0 R 28 1199 0 R 29 1200 0 R 30 1201 0 R 31 1202 0 R 32 1203 0 R 33 1204 0 R 34 1205 0 R 35 1206 0 R 36 1207 0 R 37 ] >>

Replacing "37" with whitespaces results in the file being interpretable by the Arlington checker. Can the error be caught by the tool somehow?

MaximPlusov commented 7 months ago

@asciim0 thanks for reporting this issue. It's fixed in the latest dev version 1.25.219.

The problem was in 2 Nums arrays (objects 1130 and 1131):

/Nums [ 0 1171 0 R 1 1172 0 R ... 36 1207 0 R 37 ] /Nums [ 1132 0 R 38 1133 0 R 39 1134 0 R ... 79 1170 0 R ]

The first array ends with the number 37, but the corresponding object 1132 is in another array, which is why the order of elements in the second array was disrupted.

Arlington model checker does not detect such deviations in name and number trees for now.