squizlabs / PHP_CodeSniffer

PHP_CodeSniffer tokenizes PHP files and detects violations of a defined set of coding standards.
BSD 3-Clause "New" or "Revised" License
10.66k stars 1.48k forks source link

PSR1 doesn't check for file encoding #3841

Open lucraraujo opened 1 year ago

lucraraujo commented 1 year ago

The PSR1 standard stands that "Files MUST use only UTF-8 without BOM for PHP code". There is no check for files using other encodings than UTF-8. The existing sniff checks for BOM in the files. If a file is encoded with, for example. windows-1252 encoding and don't have BOM, the file check pass.

Steps to reproduce the behavior:

  1. Create a file called test.php with any code and file encoding different than UTF-8 and without BOM
  2. Run phpcs --standard=PSR1 test.php
  3. No errors are showed regarding file encoding

Expected behavior

There should be an errors regarding the file enconding.

Operating System Debian 11.7 Bullseye
PHP version 8.2.6
PHP_CodeSniffer version 3.7.2
Standard PSR1, PSR2, PSR12
Install type Composer local
jrfnl commented 1 year ago

The Generic.Files.ByteOrderMark is only intended to check for the byte order mark, it does not check the file encoding, so that sniff is working correctly.

What I believe you are trying to report is that there is no sniff checking if files are encoded as UTF-8.

While I do believe it can be checked what files claim to be encoded as, I do not believe it is possible to reliably verify that that claim is actually correct. I may well be wrong though and/or reality may have superseded the research I did in a distant past when I looked into something like this before.

I'll mark this as a feature request for now and would be interested to hear if someone has found a way to do this.

lucraraujo commented 1 year ago

You're right. It's more a feature request than a bug.