panther-labs / panther-analysis

Built-in Panther detection rules and policies
https://panther.com/
Apache License 2.0
339 stars 173 forks source link

Incorrect YAML block style causes corruption during load/dump #1367

Closed corrylc closed 1 month ago

corrylc commented 1 month ago

YAML has two forms of block styles:

- |
 literal
- >
 folded

The literal block style effectively says "this is code, make no changes whatsoever.

The folded block style is more loose, and tells the parser that is may insert line breaks as it pleases, typically in long lines, where a single space character separates two non-space characters.

Because of this, it is critically important that code never be stored in a folded block style, as the parser can insert line breaks in places that may corrupt the code. Examples would be -- SQL comment and # Python comment which would both cause compilation failure if a line break is inserted after the respective comment indicators.

Because this issue is only triggered based on the YAML parser's own discretion as to what is a "long line", it can be somewhat unpredictable where it will happen. That said, the issue was specifically observed in the following files:

Path: snowflake_0108977_suspected_user_access_threat_hunting.yml
Path: snowflake_0108977_configuration_drift_query.yml
Path: snowflake_0108977_suspected_user_access_query.yml
Path: snowflake_0108977_ip_query.yml
Path: snowflake_0108977_ip_threat_hunting.yml
Path: snowflake_0108977_configuration_drift_threat_hunting.yml
Path: snowflake_0108977_suspected_user_activity_threat_hunting.yml

While these are the files that most immediately have the issue, due to long code comments, all uses of the > block style must be eliminated for any code/SQL block in the panther-analysis codebase, as it could cause corruption or failure with any future change, or use of a different YAML parser by any component of the Panther ecosystem.

The > block style is appropriate for any general text use, such as the various descriptions/runbooks.

Linting this would be challenging in the general case, since it must be done on the raw YAML text, before parsing. A regex looking for something like SnowflakeQuery: > would work though.