Closed mikesamuel closed 4 years ago
You can assign me, but I don't have perms to assign myself since I was crawling WebIDL the other day.
(Linking to #54)
NOT IDENTIFIERS
class in TestObject @ blink//Source/bindings/tests/idls/core/TestObject.idl:0
import in HTMLLinkElement @ blink//Source/core/html/HTMLLinkElement.idl:0
default in HTMLMenuItemElement @ blink//Source/core/html/HTMLMenuItemElement.idl:0
default in HTMLTrackElement @ blink//Source/core/html/HTMLTrackElement.idl:0
default in SpeechSynthesisVoice @ blink//Source/modules/speech/SpeechSynthesisVoice.idl:0
$ mkdir blinky
$ cd blinky
$ sudo $(which port) install py27-lex
$ sudo $(which port) install py27-regex
$ git clone https://chromium.googlesource.com/chromium/blink
$ git clone https://chromium.googlesource.com/chromium/src/tools/idl_parser
and then I adapted an existing script that crawls the WebIDL files below. I haven't stripped out the extraneous cruft
#!python
# To get this to run I did, in the containing directory
# $ git clone https://chromium.googlesource.com/chromium/blink
# $ git clone https://chromium.googlesource.com/chromium/src/tools/idl_parser
# and I installed py27-ply via port which provides lex.
import json
import os
import re
import regex
import sys
# Hack a py path so that I don't have to assume there's a parent directory with
# an _init_.py.
module_path, module_name = os.path.split(__file__)
# blink_idl_parser depends on idl_parser
idl_parser_dir = os.path.join(module_path, 'idl_parser')
sys.path.append(idl_parser_dir)
# Parses WebIDL with syntax extensions.
blink_idl_parser_dir = os.path.join(
module_path, 'blink', 'Source', 'bindings', 'scripts')
assert os.path.exists(blink_idl_parser_dir), blink_idl_parser_dir
sys.path.append(blink_idl_parser_dir)
# These have to follow the sys.path muckery
from blink_idl_lexer import BlinkIDLLexer # pylint: disable=g-import-not-at-top
from blink_idl_parser import BlinkIDLParser
from idl_node import IDLSearch
import idl_parser
from idl_parser.idl_parser import ParseFile
_SPACE_AT_EOL = re.compile(r' +$', re.M)
# http://www.ecma-international.org/ecma-262/6.0/#sec-identifier-names
_KEYWORDS = set(
[
'break', 'do', 'in', 'typeof',
'case', 'else', 'instanceof', 'var',
'catch', 'export', 'new', 'void',
'class', 'extends', 'return', 'while',
'const', 'finally', 'super', 'with',
'continue', 'for', 'switch', 'yield',
'debugger', 'function', 'this',
'default', 'if', 'throw',
'delete', 'import', 'try',
] + [
'yield', 'let', 'static',
] + [
'false', 'true', 'null'
] + [
'enum', 'await',
] + [
'implements', 'package', 'protected',
'interface', 'private', 'public',
])
_IDENTIFIER_START = '[$_[:IdStart:]]'
_IDENTIFIER_PART = '[$_[:IdContinue:]\u200C\u200D]'
_IDENTIFIER_NAME = regex.compile(
ur'^(?:%s%s*)\Z' % (_IDENTIFIER_START, _IDENTIFIER_PART),
regex.U)
def isValidJsIdentifier(s):
return s not in _KEYWORDS and _IDENTIFIER_NAME.match(s) is not None
def _strip_space_at_eol(s):
"""Strip spaces from end of line.
Useful with json.JSONEncoder which leaves
spaces after commas even when pretty printing.
Args:
s: The string to strip from.
Returns:
The string without runs of spaces before line ends.
"""
return _SPACE_AT_EOL.sub('', s)
def _multimap_add(m, k, v):
"""Adds a value to a collection of values in a map.
Args:
m: the multimap
k: the key
v: a single value
"""
if k not in m:
m[k] = []
m[k].append(v)
def process_files(file_nodes):
"""Processes a batch of IDL files.
Args:
file_nodes: IDL File AST nodes
Raises:
Exception: on an unrecoverable failure to extract the result.
Returns:
A map from names of IDL interfaces that extend Element
to a map from JavaScript object property names to HTML attribute names.
"""
NAMES_THAT_ARE_NOT_JS_IDENTS = []
class Processor(IDLSearch):
"""Collects info about interface inheritance and reflected attributes."""
# We need a minimal set of names of HTML/SVG/Math-ML element attributes
# that differ from a reflecting DOM object property.
#
# To do this, we need to determine
# 1. the set of IDL interfaces that transitively inherit from interface
# Element.
# There are two AST patterns that we look for:
# a. The pattern
# (Interface NAME=MyInterface
# (Inherit NAME=SuperType)
# ...
# )
# represents IDL of the form `interface MyInterface : SuperType`
# and `partial interface MyInterface : SuperType`
# b. Top level IDL declarations like
# `HTMLElement implements GlobalEventHandlers;`
# which is represented in the AST thus
# (Implements NAME=HTMLElement REFERENCE=GlobalEventHandlers)
#
# DERIVATION:
# https://www.chromium.org/blink/webidl#TOC-dependencies lists the
# ways in which one IDL file can depend on any other IDL files.
# It includes
# 1. partial interfaces which are irrelevant to inheritance as long
# as a partial interface declaration that specifies inheritance
# is not clobbered by one that does not.
# 2. `implements` which is handled above
# 3. Ancestors which talks about transitive inheritance which is
# handled in a post processing pass. See compute_supertypes(...)
# 4. Used interfaces which are irrelevant to inheritance.
#
# 2. The IDL attributes on those Element subtypes that reflect HTML
# attributes or which are settable and specially handled by the browser.
# To identify these we look for AST nodes like
# (Attribute NAME=myAttribute
# (ExtAttribute NAME=Reflect VALUE=...)
# (ExtAttribute NAME=CustomElementCallbacks)
# (ExtAttribute NAME=CEReactions)
# )
# and associate the attribute NAME and any Reflect VALUE with the
# containing interface.
#
# DERIVATION:
# [Reflect] is defined at
# https://html.spec.whatwg.org/multipage/infrastructure.html#reflect
# Some IDL attributes are defined to reflect a particular content
# attribute. This means that on getting, the IDL attribute returns
# the current value of the content attribute, and on setting, the IDL
# attribute changes the value of the content attribute to the given
# value.
#
# The optional value of the [Reflect] attribute is the name of the
# attribute reflected, e.g.
# [Reflect=class] attribute DOMString className;
#
# [CustomElementCallbacks] is defined at
# https://chromium.googlesource.com/chromium/src/+/master
# /third_party/WebKit/Source/bindings/IDLExtendedAttributes.md
# #CustomElementCallbacks_m_a
# This attribute is only for Custom Elements V0, and is superseded by
# [CEReactions] for V1.
#
# [CEReactions] is defined by https://html.spec.whatwg.org/ thus:
# To ensure custom element reactions are triggered appropriately,
# we introduce the [CEReactions] IDL extended attribute. It indicates
# that the relevant algorithm is to be supplemented with additional
# steps in order to appropriately track and invoke custom element
# reactions.
# so we recognize these three annotations to identify those IDL
# attributes which are of particular interest to custom elements and
# which associate with HTML attributes.
#
# An inspection of attributes listed in
# https://chromium.googlesource.com/chromium/src/+/master
# /third_party/WebKit/Source/bindings/IDLExtendedAttributes.md
# indicates no others that should be relevant.
# Of note are:
# 1. [ReflectEmpty], ..., [ReflectOnly] which are only used in
# conjunction with [Reflect].
# 2. [PutForwards] which relates to HTML attributes which should
# themselves be caught by the rules above and is only used
# on readonly IDL attributes.
def __init__(self):
IDLSearch.__init__(self)
# Maps sub-types to super-types non-transitively.
# There will be an entry 'Sub': 'Super'
# wherever we see
# interface Sub : Super { ... }
# If there is no explicit super type then there is
# no entry, so the keyset is not the set of all
# interfaces.
self.inherits = {}
# Maps interface names to (js_property_name, html_attr_name) pairs.
self.reflected_attrs = {}
# Current IDL attribute name.
self.current_attribute = None
# Current interface name. Not set for dictionary declarations
# since we don't care.
self.current_interface = None
def Enter(self, node):
try:
cls = node.GetClass()
if cls == 'Interface':
self.current_interface = node.GetProperty('NAME')
assert self.current_interface is not None
if self.current_interface not in self.reflected_attrs:
# don't clobber other parts of partial interfaces.
# https://heycam.github.io/webidl/#dfn-partial-interface
self.reflected_attrs[self.current_interface] = []
elif cls == 'Inherit':
# Dictionaries can inherit so the interface can be None.
# We don't care.
if self.current_interface is not None:
_multimap_add(self.inherits, self.current_interface,
node.GetProperty('NAME'))
elif cls == 'Implements': # Top level super-type.
_multimap_add(self.inherits, node.GetProperty('NAME'),
node.GetProperty('REFERENCE'))
elif cls == 'Attribute':
self.current_attribute = node.GetProperty('NAME')
if not isValidJsIdentifier(self.current_attribute):
NAMES_THAT_ARE_NOT_JS_IDENTS.append((
self.current_interface,
self.current_attribute,
(node.GetProperty('FILENAME'),
node.GetProperty('LINENO'))))
assert self.current_attribute is not None
elif cls == 'ExtAttribute':
# The name of an IDL attribute corresponds to a
# property on a JavaScript object, while the
# value corresponds to an HTML element attribute.
# Above, "attribute" means "IDL attribute", but
# hereafter, "attribute" means "HTML attribute".
prop = self.current_attribute
if prop is not None: # None for interface annotations.
extattr = node.GetProperty('NAME')
html_attr = None
is_reflected = False
if extattr == 'Reflect':
html_attr = node.GetProperty('VALUE') or None
is_reflected = True
elif extattr in ('CustomElementCallbacks', 'CEReactions'):
html_attr = None
is_reflected = True
if is_reflected and (
# We need a table entry when our simple heuristic fails.
# Our heuristic says that the property name is the same
# as the canonical (lower-case) HTML attribute name.
(html_attr or prop) != prop.lower()):
self.reflected_attrs[self.current_interface].append(
(prop, html_attr))
except:
print >>sys.stderr, '%s:%s' % (node.GetProperty('FILENAME'),
node.GetProperty('LINENO'))
raise
def Exit(self, node):
cls = node.GetClass()
if cls == 'Interface':
interface_name = self.current_interface
assert interface_name is not None
self.current_interface = None
# Some IDL attributes have both [Reflect "...",
# CustomElementCallbacks] in which case we will have two
# entries on the reflected list for the interface.
# Collapse those.
prop_to_attr = {}
for prop, attr in self.reflected_attrs[interface_name]:
if attr is not None or prop not in prop_to_attr:
prop_to_attr[prop] = attr
self.reflected_attrs[interface_name] = [
(prop, attr or prop.lower())
for prop, attr in prop_to_attr.iteritems()
]
elif cls == 'Attribute':
assert self.current_attribute is not None
self.current_attribute = None
p = Processor()
for f in file_nodes:
f.Traverse(p, ())
def dump():
print '\n\nNOT IDENTIFIERS'
for (class_name, attribute, (fname, lineno)) in \
NAMES_THAT_ARE_NOT_JS_IDENTS:
print '%s in %s @ %s:%s' % (attribute, class_name, fname, lineno)
dump()
inherits = p.inherits
reflected_attrs = p.reflected_attrs
# Now that we've collected interface info,
# we need to identify all Element sub-interfaces.
super_type_sets = {}
def compute_supertypes(interface_name):
# Due to top-level `X implements Y;` declarations
# dictionary names might reach interface_name, but
# that shouldn't matter since we only care about
# sub-types of Element and no dictionary is going
# to be either a super-type nor a sub-type of Element.
if interface_name not in super_type_sets:
stypes = set()
stypes.add(interface_name)
super_type_sets[interface_name] = stypes
for stype in inherits.get(interface_name, ()):
compute_supertypes(stype)
stypes.update(super_type_sets[stype])
for interface_name in inherits.iterkeys():
compute_supertypes(interface_name)
# Now, find all the element sub-types, and union their
# mixed-case properties into a mapping from lower-case
# property names to additional property names to check.
lcase_to_mixed = {}
for interface_name, stypes in super_type_sets.iteritems():
if 'Element' in stypes:
for stype in stypes:
if stype in reflected_attrs:
for prop_name, attr_name in reflected_attrs[stype]:
old_prop_name = lcase_to_mixed.get(attr_name)
if old_prop_name is None:
lcase_to_mixed[attr_name] = prop_name
elif old_prop_name != prop_name:
raise Exception('Ambiguous %s maps to %s and %s' % (
attr_name, prop_name, old_prop_name))
return lcase_to_mixed
def write_js(lcase_to_mixed, out):
"""Write JavaScript to out.
Args:
lcase_to_mixed: maps lowercase HTML attribute names to JS property names.
out: output stream for JS source.
"""
prop_to_attr = {}
noncanon_props = []
oddities = {}
for a, p in lcase_to_mixed.iteritems():
# We generate this map lazily in the JS, but
# goog.object.transpose would have undefined behavior
# if this assert failed, so we check it here.
assert p not in prop_to_attr
prop_to_attr[p] = a
if p.lower() == a:
assert p != a
noncanon_props.append(p)
else:
oddities[a] = p
noncanon_props.sort()
encoder = json.JSONEncoder(sort_keys=True, ensure_ascii=True, indent=2)
print >>out, """
/**
* @license
* Copyright (c) 2017 The Polymer Project Authors. All rights reserved.
* This code may only be used under the BSD style license found at
* http://polymer.github.io/LICENSE.txt
* The complete set of authors may be found at
* http://polymer.github.io/AUTHORS.txt
* The complete set of contributors may be found at
* http://polymer.github.io/CONTRIBUTORS.txt
* Code distributed by Google as part of the polymer project is also
* subject to an additional IP rights grant found at
* http://polymer.github.io/PATENTS.txt
*/
goog.provide('security.html.namealiases');
goog.require('goog.object');
goog.require('goog.string');
/**
* @fileoverview
* Provides a mapping from HTML attribute to JS object property names.
*/
/**
* Maps JavaScript object property names to HTML attribute names.
*
* @param {string} propName a JavaScript object property name.
* @return {string} an HTML element attribute name.
*/
security.html.namealiases.propertyToAttr = function (propName) {
var propToAttr = security.html.namealiases.propToAttr_;
if (!propToAttr) {
var attrToProp = security.html.namealiases.getAttrToProp_();
propToAttr = security.html.namealiases.propToAttr_ =
goog.object.transpose(attrToProp);
}
var attr = propToAttr[propName];
if (goog.isString(attr)) {
return attr;
}
// Arguably we could do propName.toLowerCase, but these
// two functions should be inverses.
return goog.string.toSelectorCase(propName);
};
/**
* Maps HTML attribute names to JavaScript object property names.
*
* @param {string} attrName an HTML element attribute name.
* @return {string} a JavaScript object property name.
*/
security.html.namealiases.attrToProperty = function (attrName) {
var canonAttrName = String(attrName).toLowerCase();
var attrToProp = security.html.namealiases.getAttrToProp_();
var prop = attrToProp[canonAttrName];
if (goog.isString(prop)) {
return prop;
}
return goog.string.toCamelCase(canonAttrName);
};
/**
* Instead of trusting a property name, we assume the worst and
* try to map it to a property name with known special semantics.
*
* @param {string} name a JavaScript object property or HTML attribute name.
* @return {?string} a JavaScript object property name if there is a special
* mapping that is different from that given.
*/
security.html.namealiases.specialPropertyNameWorstCase = function (name) {
var lcname = name.toLowerCase();
var attrToProp = security.html.namealiases.getAttrToProp_();
var prop = attrToProp[lcname];
if (goog.isString(prop)) {
return prop;
}
return null;
};
/**
* Returns a mapping from lower-case HTML attribute names to
* property names that reflect those attributes.
*
* @return {!Object.<string, string>}
* @private
*/
security.html.namealiases.getAttrToProp_ = function () {
if (!security.html.namealiases.attrToProp_) {
security.html.namealiases.attrToProp_ = goog.object.clone(
security.html.namealiases.ODD_ATTR_TO_PROP_);
var noncanon = security.html.namealiases.NONCANON_PROPS_;
for (var i = 0, n = noncanon.length; i < n; ++i) {
var name = noncanon[i];
security.html.namealiases.attrToProp_[name.toLowerCase()] = name;
}
}
return security.html.namealiases.attrToProp_;
};
/**
* Mixed-case property names that correspond directly to an attribute
* name ignoring case.
*
* @type {!Array.<string>}
* @const
* @private
*/
security.html.namealiases.NONCANON_PROPS_ = %(noncanon_props)s;
/**
* Attribute name to property name mappings that are neither identity
* nor simple lowercasing, like {@code "htmlFor"} -> {@code "for"}.
*
* @type {!Object.<string, string>}
* @private
*/
security.html.namealiases.ODD_ATTR_TO_PROP_ = %(oddities)s;
/**
* Maps lower-case HTML attribute names to property names that reflect
* those attributes.
*
* <p>
* This is initialized to a partial value that is then lazily fleshed out
* based on ODD_ATTR_TO_PROP_ and NONCANON_PROPS_.
* </p>
*
* @type {?Object.<string, string>}
* @private
*/
security.html.namealiases.attrToProp_ = null;
/**
* Maps property names to lower-case HTML attribute names
* that are reflected by those properties.
*
* Lazily generated from attrToProp_.
*
* @type {?Object.<string, string>}
* @private
*/
security.html.namealiases.propToAttr_ = null;
""".strip() % {
'module_name': module_name,
'noncanon_props': _strip_space_at_eol(encoder.encode(noncanon_props)),
'oddities': _strip_space_at_eol(encoder.encode(oddities)),
}
if __name__ == '__main__':
parser = BlinkIDLParser(BlinkIDLLexer())
element_to_reflected_attribute = process_files([
ParseFile(parser, idl_file)
for idl_file in sys.argv[1:]
])
# write_js(element_to_reflected_attribute, sys.stdout)
and finally I ran it thus
$ find blink/ -name \*.idl | grep -v InspectorInstrumentation | xargs python2.7 crawl_idl.py
The InspectorInstrumentation.idl
file seems to have some cpp directives which break the IDL parser so I skipped it. I think it's an input to a codegenerator for properly-formed IDL files.
I'm ignoring class
in TestObject since that's probably not reachable from Window.
import
is readonly in HTMLLinkElement.idl:L50
// HTML Imports
// https://w3c.github.io/webcomponents/spec/imports/#interface-import
readonly attribute Document? import;
default
is writable per HTMLMenuItemElement
[CEReactions, Reflect] attribute boolean default;
related to the shim, closing in favor of the other repo.
Issue 54 would benefit from knowing if the default environment binds globally properties that are not identifiers.
Crawl WebIDL and compare against the ReservedWord production.