squizlabs / PHP_CodeSniffer

PHP_CodeSniffer tokenizes PHP files and detects violations of a defined set of coding standards.
BSD 3-Clause "New" or "Revised" License
10.67k stars 1.48k forks source link

Tokenizer doesn't include new line chars in "length" #3601

Open jrfnl opened 2 years ago

jrfnl commented 2 years ago

The following code sample:

<?php

    // comment.
    function foo() {}

... will tokenize as follows:

Ptr | Ln | Col  | Cond | ( #) | Token Type                 | [len]: Content
-------------------------------------------------------------------------
  0 | L1 | C  1 | CC 0 | ( 0) | T_OPEN_TAG                 | [  5]: <?php

  1 | L2 | C  1 | CC 0 | ( 0) | T_WHITESPACE               | [  0]:

  2 | L3 | C  1 | CC 0 | ( 0) | T_WHITESPACE               | [  4]: ⸱⸱⸱⸱
  3 | L3 | C  5 | CC 0 | ( 0) | T_COMMENT                  | [ 11]: // comment.

  4 | L4 | C  1 | CC 0 | ( 0) | T_WHITESPACE               | [  4]: ⸱⸱⸱⸱
  5 | L4 | C  5 | CC 0 | ( 0) | T_FUNCTION                 | [  8]: function
  6 | L4 | C 13 | CC 0 | ( 0) | T_WHITESPACE               | [  1]: ⸱
  7 | L4 | C 14 | CC 0 | ( 0) | T_STRING                   | [  3]: foo
  8 | L4 | C 17 | CC 0 | ( 0) | T_OPEN_PARENTHESIS         | [  1]: (
  9 | L4 | C 18 | CC 0 | ( 0) | T_CLOSE_PARENTHESIS        | [  1]: )
 10 | L4 | C 19 | CC 0 | ( 0) | T_WHITESPACE               | [  1]: ⸱
 11 | L4 | C 20 | CC 0 | ( 0) | T_OPEN_CURLY_BRACKET       | [  1]: {
 12 | L4 | C 21 | CC 0 | ( 0) | T_CLOSE_CURLY_BRACKET      | [  1]: }
 13 | L4 | C 22 | CC 0 | ( 0) | T_WHITESPACE               | [  0]:

Looking at the above, raised some questions for me regarding the length provided in the token array as it does not seem to include new line characters, Is this intentional ?