mandiant / capa

The FLARE team's open-source tool to identify capabilities in executable files.
https://mandiant.github.io/capa/
Apache License 2.0
4.87k stars 561 forks source link

how to bundle TreeSitter bindings #1092

Open williballenthin opened 2 years ago

williballenthin commented 2 years ago

_Originally posted by @williballenthin in https://github.com/mandiant/capa/pull/1080#discussion_r912047439_

ideally, we want to be able to install capa simply by doing pip install flare-capa and/or fetching the standalone executable from github (generated via pyinstaller). this means our dependencies should live within the python ecosystem.

there is a supported TreeSitter library for Python; however, it doesn't include the bindings for each language we parse with TreeSitter. these bindings must be compiled into shared objects and distributed for use with the TreeSitter library.

we need to figure out how to distribute the shared object code with capa so that it "just works".

williballenthin commented 2 years ago

one strategy:

Rust has good TreeSitter library support and can statically link language bindings. Rust also has great Python binding support via PyO3, which is how we distribute our implementation of FLIRT to all supported platforms (windows/mac/linux * 32/64bits).

we could build a Python package implemented as a native library via Rust+PyO3 and distributed on PyPI that embeds the TreeSitter library and all bindings.

pro:

con: