The Java VTL project is an open source java implementation of the VTL 1.1 draft specification. It follows the JSR-223 Java Scripting API and exposes a simple connector interface one can implement in order to integrate with any data stores.
Visit the interactive reference manual for more information.
The project is divided in modules;
Add a dependency to the maven project
<dependency>
<groupId>no.ssb.vtl</groupId>
<artifactId>java-vtl-script</artifactId>
<version>0.1.13-SNAPSHOT</version>
</dependency>
ScriptEngine engine = new VTLScriptEngine(connector);
Bindings bindings = engine.getBindings(ScriptContext.ENGINE_SCOPE);
engine.eval("ds1 := get(\"foo\")" +
"ds2 := get(\"bar\")" +
"ds3 := [ds1, ds2] {" +
" filter ds1.id = \"string\"," +
" total := ds1.measure + ds2.measure" +
"}");
System.out.println(bindings.get("ds3"))
VTL Java uses the no.ssb.vtl.connector.Connector
interface to access and
export data from and to external systems.
The Connector interface defines three methods:
public interface Connector {
boolean canHandle(String identifier);
Dataset getDataset(String identifier) throws ConnectorException;
Dataset putDataset(String identifier, Dataset dataset) throws ConnectorException;
}
The method canHandle(String identifier)
is used by the engine to find
which connector is able to provide a Dataset for a given identifier.
The method getDataset(String identifier)
is then called to get the dataset.
Example implementations can be found in the java-vtl-ssb-api-connector
module
but a very crude implementation could be as such:
class StaticDataset implements Dataset {
private final DataStructure structure = DataStructure.builder()
.put("id", Role.IDENTIFIER, String.class)
.put("period", Role.IDENTIFIER, Instant.class)
.put("measure", Role.MEASURE, Long.class)
.put("attribute", Role.ATTRIBUTE, String.class)
.build();
@Override
public Stream<DataPoint> getData() {
List<Map<String, Object>> data = new ArrayList<>();
HashMap<String, Object> row = new HashMap<>();
Instant period = Instant.now();
for (int i = 0; i < 100; i++) {
row.put("id", "id #" + i);
row.put("period", period);
row.put("measure", Long.valueOf(i));
row.put("attribute", "attribute #" + i);
data.add(row);
}
return data.stream().map(structure::wrap);
}
@Override
public Optional<Map<String, Integer>> getDistinctValuesCount() {
return Optional.empty();
}
@Override
public Optional<Long> getSize() {
return Optional.of(100L);
}
@Override
public DataStructure getDataStructure() {
return structure;
}
}
This is an overview of the implementation progress.
Group | Operators | Progress | Comment |
---|---|---|---|
General purpose | round parenthesis | ||
General purpose | := (assignment) | ||
General purpose | membership | ||
General purpose | get | The keep, filter and aggregate options are not implemented. | |
General purpose | put | Defined in the grammar but not implemented | |
Join expression | []{} | ||
Join clause | filter | ||
Join clause | keep | ||
Join clause | drop | ||
Join clause | fold | ||
Join clause | unfold | ||
Join clause | rename | ||
Join clause | := (assignment) | ||
Join clause | . (membership) | ||
Clauses | rename | ||
Clauses | filter | ||
Clauses | keep | ||
Clauses | calc | ||
Clauses | attrcalc | ||
Clauses | aggregate | ||
Conditional | if-then-else | ||
Conditional | nvl | ||
Validation | Comparisons (>,<,>=,<=,=,<>) | ||
Validation | in,not in, between | ||
Validation | isnull | Implemented syntax are isnull(value) , value is null and value is not null |
|
Validation | exist_in, not_exist_in | ||
Validation | exist_in_all, not_exist_in_all | ||
Validation | check | The boolean dataset must be built manually (no lifting). | |
Validation | match_characters | ||
Validation | match_values | ||
Statistical | min, max | ||
Statistical | hierarchy | The inline definition is not supported. A dataset that has a correct structure can be used instead. | |
Statistical | aggregate | ||
Relational | union | ||
Relational | intersect | ||
Relational | symdiff | ||
Relational | setdiff | ||
Relational | merge | ||
Boolean | and | Only inside join expression (no lifting). | |
Boolean | or | Only inside join expression (no lifting). | |
Boolean | xor | Only inside join expression (no lifting). | |
Boolean | not | Only inside join expression (no lifting). | |
Mathematical | unary plus and minus | ||
Mathematical | addition, substraction | ||
Mathematical | multiplication, division | ||
Mathematical | round, ceil, floor | ||
Mathematical | abs | ||
Mathematical | trunc | ||
Mathematical | power, exp, nroot | ||
Mathematical | ln, log | ||
Mathematical | mod | ||
String | length | ||
String | concatenation | ||
String | trim | ||
String | upper/lower case | ||
String | substr | No lifting. | |
String | indexof | ||
String | date_from_string | Dataset as input not implemented. Only YYYY date format accepted. | |
Outside specification | integer_from_string | ||
Outside specification | float_from_string | ||
Outside specification | string_from_number |