This repository provides an alternative implementation to Polystat. This tool's objective is to extend the functionality of the original implementation. These extensions include:
j2eo
translator:
j2eo
installed locally, you can provide a path to it via a configuration option..sarif
files, where each SARIF file corresponds to the file in the input directory. ...and many minor quality-of-life improvements.
⚠ WARNING ⚠: The tool is still in the early stages of development, so feature suggestions and bug reports are more than welcome!
This section describes the defects that the Polystat CLI can detect by analyzing the EO intermediate representation produced by the translators, such as j2eo
and py2eo
.
Comes from: polystat/odin
Unanticipated mutual recursion happens when a subclass redefines some of the methods of the superclass in such a way that one of the methods of the superclass becomes mutually-recursive with one of the redefined methods.
Sample input (Java):
class Base {
private int x = 0;
public int getX() { return x; }
public void n(int v) {
x = v;
}
public void o(int v) {
this.n(v);
}
public void m(int v) {
this.o(v);
}
}
class Derived extends Base {
public void n(int v) {
this.m(v);
}
public void l(int v) {
this.n(v);
}
}
public class Test {
public static void main(String[] args) {
Derived derivedInstance = new Derived();
derivedInstance.l(10);
}
}
Analyzer output:
class__Derived.new:
class__Derived.new.m (was last redefined in "class__Base.new.this") ->
class__Derived.new.o (was last redefined in "class__Base.new.this") ->
class__Derived.new.n (was last redefined in "class__Derived.new.this") ->
class__Derived.new.m (was last redefined in "class__Base.new.this")
class__Derived.new.this:
class__Derived.new.this.m (was last redefined in "class__Base.new.this") ->
class__Derived.new.this.o (was last redefined in "class__Base.new.this") ->
class__Derived.new.this.n ->
class__Derived.new.this.m (was last redefined in "class__Base.new.this")
If the superclass contains this defect, this means that the inlining of one its the methods is not safe, because doing so may lead to breaking changes in its subclasses.
Comes from: polystat/odin
Sample input (Java):
class Parent {
public int f(int x) {
int t = x - 5;
assert(t > 0);
return x;
}
public int g(int y) {
return this.f(y);
}
public int gg(int y2) {
return this.g(y2);
}
public int ggg(int y3) {
return this.gg(y3);
}
public int h(int z) {
return z;
}
}
class Child extends Parent {
@Override
public int f(int y) {
return y;
}
@Override
public int h(int z) {
return this.ggg(z);
}
};
public class Test {
public static void main(String[] args) {
int x = 10;
Parent p = new Parent();
p.g(x);
x -= 5;
p.h(x);
p = new Child();
p.g(x);
p.h(x);
}
}
Analyzer output:
Inlining calls in method g is not safe: doing so may break the behaviour of subclasses!
Inlining calls in method ggg is not safe: doing so may break the behaviour of subclasses!
This defect means that the analyzed program contains the parts where the fields of the object are accessed directly. This probably means that the object with such fields breaks the incapsulation by exposing some of its private fields.
Comes from: polystat/odin
WARNING: With the current latest version of j2eo
(v0.5.3), the direct state access defect is not detected. It should work when j2eo#114 is fixed.
UPDATE: Odin v0.4.5 introduced a workaround that made the Direct State Access defect detectable in some cases.
Sample input (Java):
class A {
protected int state = 0;
};
class B extends A {
public int n(int x) {
return this.state + x;
}
}
Analyzer output:
Method 'n' of object 'class__B.new.this' directly accesses state 'state' of base class 'class__A.new.this'
This defect means that some parts of the code violate the Liskov substitution principle.
Comes from: polystat/odin
Sample input (Java):
class Parent {
public int f(int x) {
return x;
}
public int g(int x) {
return this.f(x);
}
}
class Child extends Parent {
@Override
public int f(int y) {
return 10/y;
}
}
public class Test {
public static void main(String[] args) {
Parent childInstance = new Child();
childInstance.f(10);
}
}
Analyzer output:
Method f of object this violates the Liskov substitution principle as compared to version in parent object this
Method g of object this violates the Liskov substitution principle as compared to version in parent object new
The presence of this defect in the program means that some inputs may cause this program to fail with the ArithmeticException.
Comes from: polystat/far
WARNING: The FaR analyzer is not fully-integrated with J2EO translator so the defect detection may not work correctly.
Sample input (simplified EO translation):
+package org.polystat.far
[a b] > fartest
add. > @
a.div b
div.
b.div a
a
Analyzer output:
\\perp at {a=\\any, b=0}\n\\perp at {a=0, b=\\any}\n\\perp at {a=0, b=0}
If you have coursier installed, then you can install the latest version of polystat-cli
by running:
cs install --channel https://raw.githubusercontent.com/polystat/polystat-cli/master/coursier/polystat.json polystat
After that, you can simply run:
$ polystat --help
The CLI is distributed as a "fat" jar (can be downloaded from Github Releases), so you can run without any prerequisites other than the JRE. If you have it installed, you can run polystat-cli
by just executing:
$ java -jar polystat.jar <args>
It may be helpful to define an alias (the following works in most Linux and macos):
$ alias polystat="java -jar /path/to/polystat.jar"
And then simply run it like:
$ polystat <args>
More about the arguments you can pass can be found here and here.
polystat
, it will read the configuration from the HOCON config file in the current working directory. The default name for this file is .polystat.conf
in the current working directory.$ polystat
$ polystat --config path/to/hocon/config.conf
$ polystat list -c
$ polystat list
Don't execute some rules during the analysis. This option is repeatable, so you can add any number of --exclude rule
arguments to exclude all the specified rules. In the example below all the rules but mutualrec
and long
will be executed.
$ polystat eo --in tmp --exclude mutualrec --exclude long --sarif
Execute only the given rules during the analysis. This option is also repeatable.
In the example below only mutualrec
and liskov
rules will be executed.
$ polystat eo --in tmp --include mutualrec --include liskov --sarif
src/main/java
. $ polystat java --in src/main/java --console
polystat_out/sarif
from analysing the tmp
directory with .eo
files.$ polystat eo --in tmp --sarif --to dir=polystat_out
git
repository:
git clone https://github.com/apache/hadoop
.polystat.conf
with the following contents:
polystat {
lang = java
input = hadoop
tempDir = hadoop_tmp
outputFormats = [sarif]
outputs = {
dirs = [hadoop_out],
files = [hadoop.json]
}
}
polystat-cli
without arguments:
$ polystat
or
$ java -jar polystat.jar
depending on which installation method you chose.
Executing these commands should create the following files:
hadoop_tmp
should store all the temporary files produced by translators and analyzers.hadoop_out
should contain the produced .sarif
files. Each .sarif
file corresponds to a single .java
file in the repository.hadoop.json
should contain the aggregated SARIF output for all the files in the repository. This .json
file contains a single sarifLog
object. This object has a property called runs
, which is an array of run
objects. Each run
object contains the name of the analyzed file and the results
property, which holds the results of all the analyzers that completed successfully. To battle-test our prototype analyzer, we have paired it with J2EO transpiler (from Java to EO) and created polystat-cli — a command line tool for running polystat with different settings and different transpilers. We have used J2EO v0.5.3, odin v0.4.5, FaR v0.2.0. Using J2EO and Polystat with all five analyzers turned on takes about one hour of processing on a computer with 2.4GHz 8-Core Intel Core i9 processor and 32 GB 2667 MHz DDR4 memory running macOS 12.4. The result is as follows: 1) 10378 Java files have been translated successfully and no defects have been detected by Polystat; 2) 2054 Java files have been translated incorrectly by J2EO (invalid syntax); 3) 433 Java files have been translated by J2EO without appropriate import information, so Polystat could not resolve some of the identifiers used to properly analyze the code.
This section covers all the options available in the CLI interface and their meanings.
The description follows this guide.
Note: {a | b | c} means a set of mutually-exclusive items.
polystat {eo | python} [--tmp <path>] [--in <path>] [{--include <rule...> | --exclude <rule...>}] [--sarif] [--to { console | dir=<path>| file=<path> }]... polystat java [--j2eo-version <string>] [--j2eo <path>] [--tmp <path>] [--in <path>] [{--include <rule...> | --exclude <rule...>}] [--sarif] [--to { console | dir=<path>| file=<path> }]... polystat [--version] [--help] [--config <path>] polystat list [--config | -c]
Input configuration
- The subcommand (
eo
,java
orpython
) specifies which files should be analyzed (.eo
,.java
or.py
). More languages can be added in the future.--in <file>
specifies the location of the source code to be analyzed. It can be either a directory with the files in the input language or a single file in the input language. If--in
is not specified, defaults to reading the input language code from stdin.--tmp <path>
specifies the path to the directory where the temporary files produced by analyzers are to be stored. If--tmp
is not specified, temporary files will be stored in the OS-created tempdir. It is assumed that thepath
supplied by--tmp
points to an empty directory. If not, the contents of thepath
will be purged. If the--tmp
option is specified but the directory it points to does not exist, it will be created.- The structure of the temporary directory is roughly as follows:
<path>/eo
contains the generated.eo
files.<path>/xmir
contains the generated.xml
XMIR files (if any).<path>/stdin
contains the files with the code read from stdin (if any).
--include
and --exclude
respectively define which rules should be included/excluded from the analysis run. These options are mutually exclusive, so specifying both should not be valid. If neither option is specified, all the available analyzers will be run. The list of available rule specifiers can be found via polystat list
command.--j2eo
(available only when running polystat java
) option allows users to specify the path to the j2eo executable jar. If it's not specified, it looks for one in the current working diretory.
If it's not present in the current working directory, download one from Maven Central (for now, the version is hardcoded to be 0.4.0).--j2eo-version
(available only when running polystat java
) option allows users to specify which version of j2eo
should be downloaded.--sarif
option means that the command will produce the output in the SARIF format in addition to output in other formats (if any).
--to { console | dir=<path>| file=<path> }
is a repeatable option that specifies where the output should be written. If this option is not specified, no output is produced.
--to dir=<path>
means that the files will be written to the given path. The path is assumed to be an empty directory. If it is not, its contents will be purged. If the path
is specified but the directory it points to does not exist, it will be created.
--sarif
), then the files created by the analyzer will be written in the respective subdirectory. For example, in case of --sarif
, the SARIF files will be located in path/sarif/
. The console output is not written anywhere. Therefore, if none of the output format options (e.g. --sarif
) are specified, no files are produced. --sarif
) also determine the extension of the output files. In case of --sarif
the extension would be .sarif
.--in
option specifies a directory, the structure of the output directory will be similar to the structure of the input directory. --in
specifies a single file, the file with the analysis output for this file will be written to the output directory. --in
is not specified, the generated file will be called stdin
+ the relevant extension. --to file=<path>
means that the results of analysis for all the files will be written to the file at the given path. For example, for --sarif
output format this will a JSON array of sarif-log
objects.
--to console
specifies whether the output should be written to console. The specification doesn't prevent the user from specifying multiple instances of this option. In this case, the output will be written to console as if just one instance of --to console
was present. If it's not present the output is not written to console.
polystat list
--config
or -c
is specified, prints to console the descriptions of all the possible configuration keys for the HOCON config file.--version
prints the version of polystat-cli
, maybe with some additional information.--help
displays some informative help messages for commands.--config <path>
allows to configure Polystat from the specified HOCON config file. If not specified, reads configs from the file .polystat.conf
in the current working directory.This section covers all the keys that can be used in the HOCON configuration files. The most relevant version of the information presented in this section can be printed to console by running:
$ polystat list --config
The example of the working config file can be found here.
polystat.lang
- the type of input files which will be analyzed. This key must be present. Possible values:
polystat.j2eoVersion
- specifies the version of J2EO to download.polystat.j2eo
- specifies the path to the J2EO executable. If not specified, defaults to looking for j2eo.jar in the current working directory. If it's not found, downloads it from maven central. The download only happens when this key is NOT provided. polystat.input
- specifies how the files are supplied to the analyzer. Can be either a path to a directory, path to a file, or absent. If absent, the code is read from standard input.polystat.tempDir
- the path to a directory where temporary analysis file will be stored. If not specified, defaults to an OS-generated temporary directory.polystat.outputTo
- the path to a directory where the results of the analysis are stored. If not specified, the results will be printed to console.polystat.outputFormats
- the formats for which output is generated. If it's an empty list or not specified, no output files are produced.polystat.includeRules
| polystat.excludeRules
- specified which rules should be included in / excluded from the analysis. If both are specified, polystat.includeRules takes precedence. The list of available rule specifiers can be found by running:
$ polystat.jar list
polystat.outputs.console
- specifies if the analysis results should be output to console. false
by default.polystat.outputs.dirs
- a list of directories to write files to.polystat.outputs.files
- a list of files to write aggregated output to. Polystat CLI is an sbt Scala project. In order to build the project you need the following:
Both can be easily fetched via coursier.
Running the CLI:
$ sbt run
It's best to run this command in the interactive mode, because you can specify the cmdline args there.
However, for better turnaround time, it's better to tailor the .polystat.conf
in the repository root for your needs and just run run
.
If you want to change the command-line arguments, edit the .polystat.conf
in the repository root.
The following command can be used to generate the "fat" JAR file.
$ sbt assembly
The generated .jar
file can be then found at target/scala-3.1.2/polystat.jar
.
To run the tests use the relevant sbt
task:
$ sbt test