wordfence / wordfence-cli

Wordfence malware and vulnerability scanner command line utility.
https://www.wordfence.com/products/wordfence-cli/
GNU General Public License v3.0
100 stars 20 forks source link

Error: 'utf-8' codec can't decode byte 0xfc in position 62: invalid start byte #266

Closed AleksCee closed 1 month ago

AleksCee commented 1 month ago

Hi,

some sites 13 of 15 get this error by using vuln-scan, only 2 sites on the same server runs without this error. Any ideas?

sorry forgotten, version: v4.0.2 as binary installation on Ubuntu 22.04 with LANG=de_DE.UTF-8 - also tried with en utf8

thanks, Alex

akenion commented 1 month ago

@AleksCee Could you try running with --debug and capturing the stack trace where this error occurs? It's likely related to a file name or content on the sites where it's not working, but after reviewing I'm not seeing a clear place where such an error would occur, so the stack trace will help significantly.

AleksCee commented 1 month ago

@akenion here is the debug output:

WordPress Core Version: 6.6
Traceback (most recent call last):
  File "main.py", line 4, in <module>
  File "wordfence/cli/cli.py", line 193, in main
  File "wordfence/cli/cli.py", line 187, in invoke_cli
  File "wordfence/cli/cli.py", line 43, in process_exception
  File "wordfence/cli/cli.py", line 185, in invoke_cli
  File "wordfence/cli/cli.py", line 178, in invoke
  File "wordfence/cli/vulnscan/vulnscan.py", line 277, in invoke
  File "wordfence/cli/vulnscan/vulnscan.py", line 209, in _scan_sites
  File "wordfence/cli/vulnscan/vulnscan.py", line 120, in _scan
  File "wordfence/wordpress/site.py", line 420, in get_all_plugins
  File "wordfence/wordpress/site.py", line 390, in get_plugins
  File "wordfence/wordpress/site.py", line 369, in _generate_possible_plugins_paths
  File "wordfence/wordpress/site.py", line 360, in get_configured_plugins_directory
  File "wordfence/wordpress/site.py", line 314, in _extract_string_from_config
  File "wordfence/wordpress/site.py", line 305, in _get_parsed_config_state
  File "wordfence/wordpress/site.py", line 290, in _parse_config_file
  File "wordfence/php/parsing.py", line 1639, in parse_php_file
  File "wordfence/php/parsing.py", line 1623, in parse
  File "wordfence/php/parsing.py", line 1614, in parse_any
  File "wordfence/php/parsing.py", line 1567, in parse_output
  File "wordfence/php/parsing.py", line 936, in accept_base_token
  File "wordfence/php/lexing.py", line 544, in get_next_token
  File "wordfence/php/lexing.py", line 520, in extract_inline_html_or_open_tag
  File "wordfence/php/lexing.py", line 449, in step
  File "wordfence/php/lexing.py", line 438, in _read_chunk
  File "codecs.py", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 311: invalid continuation byte
[3822364] Failed to execute script 'main' due to unhandled exception!

when I read it correctly it’s happened during the config-file… Ok there are comment with broken "umlauts" I will try to fix it. Wenn du verschiedene Pr�fixe benutzt kannst du innerhalb einer Datenbank

AleksCee commented 1 month ago

Ok, this Dochte trick, thanks for the quick help! The wp-config.php has broken encodings in the comments with German umlauts.

akenion commented 1 month ago

Thanks @AleksCee. You're correct that this is due to those characters not being valid UTF-8, but PHP isn't actually required to be UTF-8 so the parser in CLI should be able to handle this gracefully without error. I'm going to re-open this issue and include a fix in the upcoming CLI release as it should be fairly straightforward to correct.

Thanks for reporting this and your help troubleshooting; I'm glad to hear you were able to find a workaround for the time being.

AleksCee commented 1 month ago

@akenion thanks too. But sorry one more Probleme. After fixing the configs I get a new error with only one site (multisiteconfig) and I don’t figure out what’s happened:

Traceback (most recent call last):
  File "main.py", line 4, in <module>
  File "wordfence/cli/cli.py", line 193, in main
  File "wordfence/cli/cli.py", line 187, in invoke_cli
  File "wordfence/cli/cli.py", line 43, in process_exception
  File "wordfence/cli/cli.py", line 185, in invoke_cli
  File "wordfence/cli/cli.py", line 178, in invoke
  File "wordfence/cli/vulnscan/vulnscan.py", line 277, in invoke
  File "wordfence/cli/vulnscan/vulnscan.py", line 209, in _scan_sites
  File "wordfence/cli/vulnscan/vulnscan.py", line 120, in _scan
  File "wordfence/wordpress/site.py", line 420, in get_all_plugins
  File "wordfence/wordpress/site.py", line 390, in get_plugins
  File "wordfence/wordpress/site.py", line 369, in _generate_possible_plugins_paths
  File "wordfence/wordpress/site.py", line 360, in get_configured_plugins_directory
  File "wordfence/wordpress/site.py", line 314, in _extract_string_from_config
  File "wordfence/wordpress/site.py", line 305, in _get_parsed_config_state
  File "wordfence/wordpress/site.py", line 290, in _parse_config_file
  File "wordfence/php/parsing.py", line 1639, in parse_php_file
  File "wordfence/php/parsing.py", line 1623, in parse
  File "wordfence/php/parsing.py", line 1612, in parse_any
  File "wordfence/php/parsing.py", line 1599, in parse_statement
  File "wordfence/php/parsing.py", line 1519, in parse_conditional
  File "wordfence/php/parsing.py", line 1507, in parse_condition
  File "wordfence/php/parsing.py", line 1229, in parse_expression
  File "wordfence/php/parsing.py", line 1171, in parse_expression_component
  File "wordfence/php/parsing.py", line 1274, in parse_invocation
  File "wordfence/php/parsing.py", line 1260, in parse_argument_list
AttributeError: 'NoneType' object has no attribute 'is_character'
[3885201] Failed to execute script 'main' due to unhandled exception!

have you a tip where I can find the issue? The config file is the same a all this others. Only diff are the hashes and dB config. And additional this block:

/ Multisite / define('WP_ALLOW_MULTISITE', true); define('MULTISITE', true); define('SUBDOMAIN_INSTALL', true); define('DOMAIN_CURRENT_SITE', '*****delete**.net'); define('PATH_CURRENT_SITE', '/'); define('SITE_ID_CURRENT_SITE', 1); define('BLOG_ID_CURRENT_SITE', 1);

AleksCee commented 1 month ago

Ok after some try and error I found out that the parse comes in trouble with this line:

define('PATH_CURRENT_SITE', '/');

When I temporarily remove this line for testing, it’s work. But this line is needed for multisites.

OK, seams to be a accident - did not work. :-(

davidnuzik commented 1 month ago

v4.0.3rc4 8/1/24

SUMMARY: QA validation PASSED. I was successfully able to reproduce the vuln-scan php parsing issue and validate the fix.

REPRODUCTION STEPS:

  1. Write a python script which will append a php file that gets read in and parsed during a vuln-scan (wp-config.php for example). This script should write (in binary) invalid utf-8. NOTE: This obviously will break a wp-config.php file and cause WordPress to no longer function it was merely done in a test environment to reproduce the UnicodeDecodeError error and validate the fix.
  2. Using the current official release, v4.0.2, attempt to vuln-scan this WordPress install with the altered wp-config.php on disk. I successfully reproduced the issue. Note my byte shown in the error is slightly different but this is not relevant. UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3649: invalid start byte

VALIDATION STEPS: Execute step 2 above again, but this time using v4.0.3rc4. The issue no longer occurs and the output from the vuln-scan is nominal.

NOTES: Internal test automation was updated to include tests for this issue going forward. I also executed additional tests (including besides the vuln-scan subcommand) to ensure the Wordfence CLI still behaves normally in all areas. No significant or related issues observed.

Other PRs also validated - for example CLI output for vuln-scans was including a 'b' character when outputting versions - this is fixed as well. All internal test automation passes across the board.