pdf-association / arlington-pdf-model

A vendor- and implementation-independent specification-derived, machine-readable model of PDF.
Apache License 2.0
75 stars 6 forks source link

Fields and Widgets, Widgets and Fields. #28

Open faceless2 opened 2 years ago

faceless2 commented 2 years ago

It was inevitable this was going to come up at some point.

First, I'm assuming a processing model which means a node in the PDF can be of more than one type. Traverse to a combined field+widget from Fields? It's validated as Field. Traverse from a Page? It's also validated as a Widget. Everything below assumes that model, if that's not how you do it I guess you can ignore the whole thing.


Currently there are 3 types, Field (an untyped field with no FT), FieldNNN (a typed field with FT) and AnnotWidget. And there is a single type for a list of these items, ArrayOfFields which is used for both Fields in the Form and Kids in the Fields. It's a list of: [FieldTx,FieldBtn,FieldCh,FieldSig,Field,AnnotWidget] - I'm ignoring the predicate for FieldSig.

This means that we have the following allowed behaviour:

  1. The form can contain a Fields array that references a widget that has no field (either combined or as a parent)
  2. A widget can belong to a Field with no FT, or belong to no field at all.
  3. The form Fields array can point to elements with a Parent
  4. There is no requirement for consistency between the Parent and Kids arrays
  5. If a Field is combined with a widget, there is no check to ensure it has no Kids
  6. There is no requirement for a Field to have any Widgets.

I think all of those are disallowed (happy to justify if required), so here's a proposal to remedy this.

To fix the first two issues you could split ArrayOfFields into ArrayOfFieldsOrWidgets. Your types then look like

Form
  Fields [ArrayOfFields]

Field
  Parent [Field,FieldTx,FieldCh,FieldBtn,FieldSig]
  Kids [ArrayOfFields]

FieldTx, FieldCh etc
  Parent [Field,FieldTx,FieldCh,FieldBtn,FieldSig]
  Kids [ArrayOfFieldsOrWidgets]

AnnotWidget
  Parent [FieldTx,FieldCh,FieldBtn,FieldSig]
  Kids [none - it's currently defined as ArrayofFields, but should be removed]

ArrayOfFields
  * [Field,FieldTx,FieldCh,FieldBtn,FieldSig]

ArrayOfFieldsOrWidgets
  * [FieldTx,FieldCh,FieldBtn,FieldSig,AnnotWidget]

The last issues can be done with some magic in your SpecialCase field - we need to check

because the rules for Fields are:

Parent - (Required if this field is the child of another in the field hierarchy; absent otherwise) The field that is the immediate parent of this one (the field, if any, whose Kids array includes this field). A field can have at most one parent; that is, it can be included in the Kids array of at most one other field.

Kids - In a non-terminal field, the Kids array shall refer to field dictionaries that are immediate descendants of this field. In a terminal field, the Kids array ordinarily shall refer to one or more separate widget annotations that are associated with this field. However, if there is only one associated widget annotation, and its contents have been merged into the field dictionary, Kids shall be omitted.

and for Widgets:

Parent - (Required if this widget annotation is one of multiple children in a field; optional otherwise) An indirect reference to the widget annotation’s parent field. A widget annotation may have at most one parent; that is, it can be included in the Kids array of at most one field

I think we can represent all that with anfn:Eval that looks like this (expanded to make it a bit more legible):

(
 ((@Parent==null) && (fn:InArray(trailer::Root::AcroForm::Fields))) ||
 ((@Parent!=null) && (fn:InArray(parent::Kids)))
) && (
 ((@Subtype==Widget) && (Kids==null)) ||
 ((@Subtype==null) && (fn:ArraySize(Kids)>0))
)

It's using /Subtype/Widget as the test for "is a widget", which is not quite right, and I've also just invented fn:InArray, and presumed that ==null is the same as "field is not there" - which probably isn't the case. However I think the logic is correct.

Finally, as an alternative if you don't want to go crazy with the special case field, I think we could capture the same logic by splitting FieldTx into lots of subtypes eg FieldTxNonTerminal, FieldTxTerminal, FieldTxTerminalCombined etc, with the same for the other field types. It's a more declarative but explodes the number of types.

Sorry, that's a rough one to start the day with.

faceless2 commented 2 years ago

Incidentally I tried the first suggestion, the splitting of ArrayOfFields into ArrayOfFieldsAndWidgets, and it tested well against some valid forms combining combined and uncombined fields, and fields with name hierarchies.

EDIT 20201001 - the one complexity is Parent in AnnotWidget - if it contains FT, we also need to allow Field as an option for Parent

faceless2 commented 2 years ago

Another followup on this:

Field places restrictions on Ff - [fn:Eval(fn:BitsClear(4,32))]. But this is an intermediate type - it must have an entry in the Kids array which is a FieldNNN, and any restrictions on the Flags are checked there. So I don't think this restriction should be here.

faceless2 commented 2 years ago

At the very least on this one, even if none of the above changes are applied:

AnnotWidget Parent should go from Field to [FieldTx,FieldBtnPush,FieldBtnCheckbox,FieldBtnRadio,FieldChoice,fn:SinceVersion(1.3,FieldSig),Field] - otherwise Parent cannot be a terminal field, which is clearly not right.

On reflection this is an aspect of our implementation not the model.

petervwyatt commented 2 years ago

I'm starting to work on this. Just did some tidy up and referencing of all Annot*.tsv.

bdoubrov commented 10 months ago

This issue still pops up regularly in the tests, as Field and AnnotWidget have different permitted entries. One of the solutions would be to have a separate type for the merged Field+Widget dictionary. This would lead in fact to the following new types:

AnnotWidgetField, AnnotWidgetFieldTx, AnnotWidgetFieldCh, AnnotWidgetFieldBtn, AnnotWidgetFieldSig

This looks a bit weird, but I don't immediately see any other nicer solutions.

bdoubrov commented 8 months ago

Just as an update, the latest version of veraPDF-based Arlington app implements the above suggestion when doing the conversion of the .tsv files to the veraPDF Arlington profile. Seems to work as expected

https://software.verapdf.org/develop/arlington/1.25/verapdf-arlington-1.25.236-installer.zip