totten / afform

Affable Administrative Angular Form Framework for CiviCRM
4 stars 4 forks source link

Afform Auditor: Defining schemas #20

Open totten opened 5 years ago

totten commented 5 years ago

The auditor is a component which identifies issues in stored forms - which, in turn, involves two general areas:

  1. Schema/ruleset design - How do you define "valid" and "invalid"? (Ex: Create a whitelist of supported tags.)
  2. User/developer experience - When should the forms be checked for validity, and how should this be communicated? (Ex: Check the rulesets after any extensions are upgraded.)

This issue is specifically focusing on the schema/ruleset side.

Example Rules and Mental Model

Before we dig into specific tools or algorithms, let's consider some rules we'd like to articulate in English:

In considering these rules, it stands out to me that we have distinct sets of rules which can be combined. Thus, the "set of rules for user-editable forms" is the result of combining the "set of rules for basic HTML" plus "set of rules for basic AngularJS" plus "set of rules for BootstrapCSS" plus "set of rules for Afform data-handling".

Existing Standards

Tools for validating SGML/HTML/XML have been around as long as SGML/HTML/XML have been around. There are three widely used standards for defining XML rules: Document Type Definition (DTD), XML Schema Definition (XSD), and RelaxNG (RNG).

These three systems have an obvious strength: there are many tools, libraries, tutorials, books, stackexchange questions, etc which deal with them. You will find many examples of how to create the schema for a document like this:

<library>
   <book>
     <author><name>Lewis Carroll</name><dob>1832-01-27</dob></author>
     <title>Alice in Wonderland</title>
   </book>
</library>

I started refreshing myself a bit on these - and, in particular, I liked this RelaxNG book. Two interesting things:

So... can we address the bulk of Afform validation by delegating out to a standard library and making a few XML config files? I initially hoped so, but I'm starting that something more is needed:

I think it's worth verbalizing a bit about other ways to organize rules.

Concept: CSS-like Validation DSL

In this pseudocode sketch, the basic concept is to list selectors and apply some policy to the match elements. It resembles CSS. To wit: Given a selector (h1,h2,h3 or a.btn), mark the identified elements as valid/OK. Or, given a selector, mark the matching elements with a warning. Or... call a PHP function to evaluate each matching element.

@define html-content {
  div, span, p, h1, h2, h3, h4, h5, h6, blockquote, pre {ok}
}
@define html-style {
  strong, em, tt, code, del, sub, sup, cite {ok}
  b, i, strike, center {
    /* "ok" above was a special short-hand for "$this->ok();", but generally... these are PHP blocks */ 
    $this->warn('The old school layout tags are deprecated. Use a semantic tag like <strong> or <em>.');
  }
}
@define afform-data {
  af-model-list {ok}
  af-model-prop, af-model-prop[name], af-model-prop[type] {ok}
  af-model, af-model[name] {ok}
  af-field, af-field[field-name] {ok}
  af-model-prop { 
    /* Run the PHP code on each matching element */
    static $entityTypes = Civi\Api4\Entity::get()->addSelect('name')->execute()->indexBy('name');
    if (!isset($entityTypes[$this['type']])) {
      $this->warn(ts('Unknown entity type!'));
    }
  }
  af-model { checkAfformModelName($this) }
}
@define bootstrap-style {
  .btn {
    if ($this->getName() == 'a' || $this->getName()  == 'button') { $this->ok(); }
    else { $this->warn(ts('Bootstrap buttons must use <A> or <BUTTON>.')); }
  }
  .btn-default, .btn-primary, ... {
    if ($this->hasClass('btn')) { $this->ok(); }
    else { $this->warn(ts('Bootstrap button decorators must be used with "btn".')); }
  }
}

/** A user-editable form may contain a mix of some HTML, some Angular, and some BootstrapCSS */
@define afform-gui-editable {
  @include html-content, html-style, afform-data, bootstrap-style
}

What I like about this: The DSL is concise. If you know CSS and a little PHP, then it should be fairly easy to read. (The Github CSS syntax highlighter even does a nice job despite some non-CSS bits.) CSS provides notations for tag-elements, attributes, and HTML classes.

What I don't like about this: Using a DSL requires more parsing work. If one needs to generate rules programmatically (e.g. via hook), then you have to go understand another notation.

Concept: Array of Selector/Action Rules

In this sketch, we avoid the need to implement a DSL. Just expose an array data-structure.

/**
 * @var array $rulesets
 *  Ex: $rulesets['myrule'][] = ['match' => string $cssSlector, 'call' => mixed $callable];
 */
$rulesets = [
  'html-content' => [
    ['match' => 'div, span, p, h1, h2, h3, h4, h5, h6', 'call' => Auditor::OK],
  ],
  'html-style' => [
    ['match' => 'strong, em, tt, code, del, sub, sup, cite', 'call' => Auditor::OK],
    ['match' => 'b, i, strike, center', 'call' => function($ctx) {
      $ctx->warn(ts('The old school layout tags are deprecated. Use a semantic tag like <strong> or <em>.'));
    }],
  ],
  // et al...
];

What I like: It's amenable to hooking and merging; it can be amenable to serialization (depending on what callback notations are allowed). It's easy to imagine adding more metadata to each rule (like a weight or a symbolic name).

What I don't like: The array-structure gets fairly deep and doesn't document itself.

Concept: Fluent Rule Builder

In this sketch, it uses the same mental model as the other two (match a CSS selector; specify a callback function). However, it uses a fluent OOP style to build the rules. Some of the fluent functions (ok($cssSelctor) or warn($cssSelector, $message)) are short-cuts for registering callback functions.

$rulesets = Civi::service('afform_rule_sets');

$rulesets->define('html-content')
  ->ok('div, span, p, h1, h2, h3, h4, h5, h6');

$rulesets->define('html-style')
  ->ok('strong, em, tt, code, del, sub, sup, cite')
  ->warn('b, i, strike', ts('The old school layout tags are deprecated. Use a semantic tag like <strong> or <em>.'))
  // Or... an equivalent but more general-purpose notation...
  ->call('b, i, strike', function($ctx){
      $ctx->warn(ts('The old school layout tags are deprecated. Use a semantic tag like <strong> or <em>.'));
  })

What I like: You get better IDE support (autocomplete/drilldown).

What I don't like: The canonical form is PHP code that's hard to serialize/transmit. Adding in weights and symbolic-names may not be as pretty.

totten commented 5 years ago

Note to self regarding CSS/XPath/selector mechanics...

In the latter three concepts, one can imagine using the same basic algorithm for each: you load the HTML partial into a tree (i.e. every DOM element, DOM attribute, and HTML class shows up as a node in the tree). Every node in the tree has a status initialized to unknown. To apply the rules, you run each selector and flip the status on matching nodes (...to ok or warn or deprecated or experimental or somesuch). At the end, you report any nodes with a scary status attached.

This needs a mechanism for evaluating selectors. There's a Symfony package which translates CSS to XPath. PHP stdlib supports XPath. That all looks promising. (phpQuery is probably similar under-the-hood.) That CSS=>XPath mapping almost fits... but not really.

It would work for flagging DOMElements, but not so much for selecting DOMAttrs and HTML classes. Consider the CSS selector input[type=text] or a[href]; one normally wants to get the DOMElement (input or a). But since we track DOM attributes and HTML classes as distinct things, it's more interesting to get the DOMAttr (type or href). The nuance isn't expressable in CSS but it is in XPath. Consider an example HTML-partial and example XPath query:

<!-- Example document -->
<div>
  <input type="text"/>
  <input type="number"/>
</div>
<!-- Example queries -->

XPath: "descendant-or-self::input[@type = 'text']"
Result: DOMElement (nodeName=="input", nodePath== "/html/body/div/input[1]")

XPath: "descendant-or-self::input[@type = 'text']/@type"
Result: DOMAttr (nodeName=="type", nodePath== "/html/body/div/input[1]/@type")

XPath: "descendant-or-self::input/@type[string() = 'text']"
Result: DOMAttr (nodeName=="type", nodePath=="/html/body/div/input[1]/@type")