vipyrsec / dragonfly-client-rs

Dragonfly client written in Rust
https://docs.vipyrsec.com/dragonfly-client-rs/dragonfly_client_rs/
MIT License
5 stars 2 forks source link

Parse AST for specific function checking #21

Open import-pandas-as-numpy opened 1 year ago

import-pandas-as-numpy commented 1 year ago

Specification

Implement a feature which allows us to utilize Python's abstract syntax tree to match our current YARA rules against.

Motivation

The Abstract Syntax tree offers much more context for what Python understands a function to be doing. The use of a string in one context doesn't necessitate that string being an indicator in the entire program. Passing something like rm -rf / in a subprocess or system command is far riskier than finding that string in a docstring, but current YARA conventions have created an issue where we must either check to see that it isn't in a docstring (currently impossible, no lookaheads/lookbehinds) or we must specify the specific contexts that this command must flag in regex itself. (As in, in this case, we would have to look for subprocess calls with those arguments a list passed.)

Additionally, this would be a significant quality of life enhancement to PyPI staff, who would now be pointed to a specific line of malicious behavior.

Precedent for this exists in two forms, Semgrep and YARA itself. Semgrep is able to comprehend far more semantics of the language, to derive the context in which something is used. YARA has pefile section features to allow you to reference specific sections of a PE file to derive behavior in the context that it might appear. (For instance, .rsrc containing a malware.dll is something that YARA currently supports detection for.)

Open Questions

Requirements

import-pandas-as-numpy commented 1 year ago

@Robin5605 @AbooMinister25 @jonathan-d-zhang @Recursive-Error Review/eyes requested.

AbooMinister25 commented 1 year ago

for this question ~

Will this be something we can easily extend to other languages? If we ever elect to scan another ecosystem such as NPM, using an AST might be useful there too. If we can avoid footgunning ourselves by abstracting this in a way that makes drop in functionality useful.

Considering that the semantics of the languages differ, I imagine that identifying what specific nodes to apply specific rules to would change as well - depending on how we structure the API, I suppose maybe something like providing mappings of nodes to a set of rules or whatever, differing per language, might be feasible.

Robin5605 commented 1 year ago

Will this be something we can easily extend to other languages? If we ever elect to scan another ecosystem such as NPM, using an AST might be useful there too. If we can avoid footgunning ourselves by abstracting this in a way that makes drop in functionality useful.

Superficially, technically yes. If we go with something like treesitter, for instance, it supports parsing a whole bunch of languages