semgrep / pfff

pfff is mainly an OCaml API to write static analysis, dynamic analysis, code visualizations, code navigations, or style-preserving source-to-source transformations such as refactorings on source code.
https://semgrep.dev
Other
186 stars 29 forks source link

Added new `Path.t` type #557

Closed brandonspark closed 2 years ago

brandonspark commented 2 years ago

What:

Currently, our type filename is just string. This allows an infinite number of invalid representations for values of type filename, as we have no static guarantees that any such value is actually a path. This can lead to bugs, such as https://github.com/returntocorp/deep-semgrep/pull/253.

Why:

We should introduce a new type which does not allow such representation errors, so that we can avoid potential bugs in the future, and have the type system ensure that these bugs cannot exist.

How:

This PR introduces a Path module to Common. I thought about putting it into its own file, which I am still open to, but since this is where we store most of our utility library stuff, I figured it could go here.

The Path library has a t type, which is also defined as path_special in Common. This is so the constructor is available at the top-level. This type, in particular, has only a single constructor, which is a private variant. This means that it can be matched upon to extract the constituent value (just a string), but it cannot be freely used to construct values of the type. Instead, they must use the Path.of_string function, which normalizes the path. This makes sure that all values of type Path.t must be completely normalized paths.

I considered changing the filename type to just be the Path.t type, but this actually causes like a billion changes in the pfff submodule alone. I figure the best way is to offer the choice to migrate to Path, and slowly adopt that change in constituent code.

Test plan:

make test

Security

aryx commented 2 years ago

Regarding the test, a subtle thing is that tests are run from a special directory (_build/default/tests/) so if you write unit tests for Path.ml, you might need to use relative paths like ../../../tests/. Look at what we do in semgrep-core/src/engine/Unit_engine.ml and its tests_path toplevel constant.

aryx commented 2 years ago

Did you check there was nothing related in the jane street Core or Base library? Or other OCaml libs in opam?

aryx commented 2 years ago

Even if there was related libraries, I would still be ok to merge your PR because it's small anyway and well integrated with the API we currently use in Common.ml (read_file, write_file, etc.) but it's still nice to have in the toplevel comment of Path.ml (or Path.mli) related libraries that are similar and why we still decided to implement our own.