richardmyu / blog

个人(issues)博客
https://github.com/richardmyu/blog/issues
MIT License
0 stars 0 forks source link

getElementsByClassName vs querySelectorAll #1

Open richardmyu opened 5 years ago

richardmyu commented 5 years ago

在《JavaScript 设计模式》第二章 -- "什么是模式" 中提到一个问题:

如果我们有一个脚本,想为页面上每一个具有 "foo" 类的 DOM 元素添加一个计数器,查询列表最简单有效的方法是什么?

  1. 在页面上选择所有元素并储存,然后过滤集合;

  2. 使用浏览器原生的 querySelectorAll() 等功能来选择;

  3. 使用原生特性 getElementsByClassName() 等功能来获取;

然后很自然就会提出另一个问题是:那种方法最快?书中指出是第 3 种方法,比其他方法快 8 到 10 倍。

于是我去 jsperf 测试了一下,发现确实是这样。然后我们自己也可以创建测试文件进行测试:


  <body>
    <div class="test"></div>
    <script>
      console.time("querySelectorAll");
      for (let i = 0; i < 100000; i++) {
        document.querySelectorAll(".test");
      }
      console.timeEnd("querySelectorAll");
    </script>
    <script>
      console.time("getElementsByClassName");
      for (let i = 0; i < 100000; i++) {
        document.getElementsByClassName(".test");
      }
      console.timeEnd("getElementsByClassName");
    </script>
  </body>

三次测量结果分别为:

// 第一次
querySelectorAll: 56.283935546875ms
getElementsByClassName: 5.2900390625ms

// 第二次
querySelectorAll: 60.02099609375ms
getElementsByClassName: 5.06298828125ms

// 第三次
querySelectorAll: 53.64404296875ms
getElementsByClassName: 5.153076171875ms

可以看到 getElementsByClassNamequerySelectorAll 快了近 10 倍。顺便我也测试了 getElementByIdgetElementsByTagName 以及 querySelector。即便是 querySelector 也较 getElement 系列的慢 2 倍左右。详情可见 demo

所以接下来的问题是:两者有何区别以及为什么产生这样的差异?

区别

在知乎上有这个问题的回答,下面记录一下[1]

1. W3C 标准 querySelectorAll 属于 W3C 中的 Selectors API 规范。而 getElementsBy 系列则属于 W3C 的 DOM 规范。

Selectors API Level 2 []

2. 浏览器兼容

querySelectorAll 已被 IE 8+、FF 3.5+、Safari 3.1+、Chrome 和 Opera 10+ 良好支持 。getElementsBy 系列,以最迟添加到规范中的 getElementsByClassName 为例,IE 9+、FF 3 +、Safari 3.1+、Chrome 和 Opera 9+ 都已经支持该方法了。

querySelectorAll 浏览器兼容性

getElementsByClassName 浏览器兼容性

3. 接收参数

querySelectorAll 方法接收的参数是一个 CSS 选择符(CSS 选择器中的元素名,类和 ID 均不能以数字为开头)。而 getElementsBy 系列接收的参数只能是单一的 className、tagName 或 name 等等。代码如下:

var c1 = document.querySelectorAll('.b1 .c');
var c2 = document.getElementsByClassName('c');
var c3 = document.getElementsByClassName('b2')[0].getElementsByClassName('c');

querySelectorAll 方法接收的参数包含一个或多个匹配的选择器(用逗号分隔多个选择器)。这个参数字符串必须是一个合法的 CSS selector, 如果不是,会抛出一个 SyntaxError 错误。见 Document​.query​SelectorAlldemo2

需要注意的是,querySelectorAll 所接收的参数是必须严格符合 CSS 选择符规范的。所以下面这种写法,将会抛出异常。代码如下:

try {
  var e1 = document.getElementsByClassName('1a2b3c');
  var e2 = document.querySelectorAll('.1a2b3c');
} catch (e) {
  console.error(e.message);
  // Failed to execute 'querySelectorAll' on 'Document': '.1a2b' is not a valid selector.
}
console.log(e1 && e1[0].className); // content 1a2b
console.log(e2 && e2[0].className); // undefined

4. 返回值

querySelectorAll 返回的是一个 static (not live) NodeList,而 getElementsBy 系列的返回的是一个 live NodeList(live HTMLCollection),下面我们再具体看看这是什么意思。

This is one of the major gotchas of the Document Object Model. The NodeList object (also, the HTMLCollection object in the HTML DOM) is a special type of object. The DOM Level 3 spec says about HTMLCollection objects:

NodeList and NamedNodeMap objects in the DOM are live; that is, changes to the underlying document structure are reflected in all relevant NodeList and NamedNodeMap objects. For example, if a DOM user gets a NodeList object containing the children of an Element, then subsequently adds more children to that element (or removes children, or modifies them), those changes are automatically reflected in the NodeList, without further action on the user’s part. Likewise, changes to a Node in the tree are reflected in all references to that Node in NodeList and NamedNodeMap objects.

The querySelectorAll() method is different because it is a static NodeList instead of a live one. This is indicated in the Selectors API spec:

The NodeList object returned by the querySelectorAll() method must be static, not live ([DOM-LEVEL-3-CORE], section 1.1.1). Subsequent changes to the structure of the underlying document must not be reflected in the NodeList object. This means that the object will instead contain a list of matching Element nodes that were in the document at the time the list was created.

接下来我们再看看下面这个经典的例子(demo3):

// Demo 1
var ul = document.querySelectorAll('ul')[0],
    lis = ul.querySelectorAll("li");
for(var i = 0; i < lis.length ; i++){
    ul.appendChild(document.createElement("li"));
}

// Demo 2
var ul = document.getElementsByTagName('ul')[0], 
    lis = ul.getElementsByTagName("li"); 
for(var i = 0; i < lis.length ; i++){
    ul.appendChild(document.createElement("li")); 
}

If a collection is live, then the attributes and methods on that object must operate on the actual underlying data, not a snapshot of the data.from

因为 Demo 2 中的 lis 是一个动态的 Node List, 每一次调用 lis 都会重新对文档进行查询,导致无限循环的问题。而 Demo 1 中的 lis 是一个静态的 Node List,是一个 li 集合的快照,对文档的任何操作都不会对其产生影响。

但为什么要这样设计呢?其实,在 W3C 规范中对 querySelectorAll 方法有 明确规定

The NodeList object returned by the querySelectorAll() method must be static ([DOM], section 8).

The NodeList object returned by the querySelectorAll() method must be static, not live ([DOM-LEVEL-3-CORE], section 1.1.1). Subsequent changes to the structure of the underlying document must not be reflected in the NodeList object. This means that the object will instead contain a list of matching Element nodes that were in the document at the time the list was created.from

The querySelectorAll(selectors) method, when invoked, must return the static result of running scope-match a selectors string selectors against context object. from

那什么是 NodeList 呢?W3C 中是 这样 说明的:

The NodeList interface provides the abstraction of an ordered collection of nodes, without defining or constraining how this collection is implemented. NodeList objects in the DOM are live.

whatwg 则是这样 说明:

A collection is an object that represents a list of nodes. A collection can be either live or static. Unless otherwise stated, a collection must be live.

所以,NodeList 本质上是一个动态的 Node 集合,只是规范中对 querySelectorAll 有明确要求,规定其必须返回一个静态的 NodeList 对象。我们再看看在 Chrome 上面是个什么样的情况:

document.querySelectorAll('a').toString();    // "[object NodeList]"
document.getElementsByTagName('a').toString();    // "[object HTMLCollection]"

这里又多了一个 HTMLCollection 对象出来,那 HTMLCollection 又是什么?

HTMLCollection 在 W3C 的定义如下:

An HTMLCollection is a list of nodes. An individual node may be accessed by either ordinal index or the node's name or id attributes. Note: Collections in the HTML DOM are assumed to be live meaning that they are automatically updated when the underlying document is changed.

A NodeList object is a collection of nodes.from An HTMLCollection object is a collection of elements.from

实际上,HTMLCollection 和 NodeList 十分相似,都是一个动态的元素集合,每次访问都需要重新对文档进行查询。两者的本质上差别在于,HTMLCollection 是属于 Document Object Model HTML 规范,而 NodeList 属于 Document Object Model Core 规范。简单说,NodeList 是 node 集合,而 HTMLCollection 则是 element 集合,即前者包含后者。

看看下面的例子会比较好理解(demo4):

<ul>
    <li>1</li>
    <li>2</li>
    <li>3</li>
    <li>4</li>
</ul>
var ul = document.getElementsByTagName('ul')[0],
    lis1 = ul.childNodes,
    lis2 = ul.children;
console.log(lis1.toString(), lis1.length);    // "[object NodeList]" 7
console.log(lis2.toString(), lis2.length);    // "[object HTMLCollection]" 3

NodeList 对象会包含文档中的所有节点,如 Element、Text 和 Comment 等。HTMLCollection 对象只会包含文档中的 Element 节点。另外,HTMLCollection 对象比 NodeList 对象 多提供了一个 namedItem 方法。

总之,在现代浏览器中,querySelectorAll 的返回值是一个静态的 NodeList 对象,而 getElementsBy 系列的返回值实际上是一个动态的 HTMLCollection 对象 。

我理解的是 querySelectorAll 返回的是 DOM 的快照,而 getElementsBy 返回的是真实的 DOM(我猜想,querySelectorAll 比 getElementsBy 慢在于遍历的 node 更多???)

stackoverflow 上也有这个问题 [2]。其中某个答案认为:querySelector is more useful when you want to use more complex selectors. O(∩_∩)O哈哈~ 这也是一个方面吧!

而另外一个回答者 Alvaro Montoro 的观点是:

About the differences, there is an important one in the results between querySelectorAll and getElementsByClassName: the return value is different. querySelectorAll will return a static collection, while getElementsByClassName returns a live collection

即通过 querySelectorAll 获取的变量是固定的,仅在 querySelectorAll 被调用的时候,有可能变化;而 getElementsByClassName 获取的变量则是不固定的,可能会在该变量被引用的时候发生变化(因为是 live collection,每次调用都会重新查询 DOM,从而更新变量的值)

Timofey 的观点为:

Changes to live elements apply immediately - changing a live element changes it directly in the DOM, and therefore the very next line of JS can see that change, and it propagates to any other live elements referencing that element immediately. Changes to static elements are only written back to the DOM after the current script is done executing. These extra copy and write steps have some small, and generally negligible, effect on performance.

注意:getElementsByNamew3c 规范中返回的是 live NodeList,实际测试中也是,不同于 getElementsByClassNamegetElementsByTagName。 Every element, and the global document, have access to all of these functions except for getElementsByName, which is only implemented on document.

getElementsByTagNamequerySelectorAll 方法快

Live NodeList objects can be created and returned faster by the browser because they don’t have to have all of the information up front while static NodeLists need to have all of their data from the start. To hammer home the point, the WebKit source code has a separate source file for each type of NodeList: DynamicNodeList.cpp and StaticNodeList.cpp. The two object types are created in very different ways.[3]

The DynamicNodeList object is created by registering its existence in a cache. Essentially, the overheard to creating a new DynamicNodeList is incredibly small because it doesn’t have to do any work upfront. Whenever the DynamicNodeList is accessed, it must query the document for changes, as evidenced by the length property and the item() method (which is the same as using bracket notation).

Compare this to the StaticNodeList object, instances of which are created in another file and then populated with all of the data inside of a loop. The upfront cost to running a query on the document is much more significant than when using a DynamicNodeList instance.

If you take a look at the WebKit source code that actually creates the return value for querySelectorAll(), you’ll see that a loop is used to get every result and build up a NodeList that is eventually returned.

The real reason why getElementsByTagName() is faster than querySelectorAll() is because of the difference between live and static NodeList objects. Although I’m sure there are way to optimize this, doing no upfront work for a live NodeList will generally always be faster than doing all of the work to create a static NodeList. Determining which method to use is highly dependent on what you’re trying to do. If you’re just searching for elements by tag name and you don’t need a snapshot, then getElementsByTagName() should be used; if you do need a snapshot of results or you’re doing a more complex CSS query, then querySelectorAll() should be used.


参考:

1.querySelectorAll 方法相比 getElementsBy 系列方法有什么区别?

2.querySelector and querySelectorAll vs getElementsByClassName and getElementById in JavaScript

3.Why is getElementsByTagName() faster than querySelectorAll()?

4.Accessing the DOM is not equal accessing the DOM – live vs. static element collections

5.HTMLCollection, NodeList and array of objects