souffle-lang / souffle

Soufflé is a variant of Datalog for tool designers crafting analyses in Horn clauses. Soufflé synthesizes a native parallel C++ program from a logic specification.
http://souffle-lang.github.io/
Universal Permissive License v1.0
916 stars 207 forks source link

ADT definitions accessible from compiled souffle? #2181

Open aeflores opened 2 years ago

aeflores commented 2 years ago

Is there a way to access the ADT definitions from a compiled instance of a souffle program. I see that a representation of these definitions is included in the directiveMap here https://github.com/souffle-lang/souffle/blob/master/src/synthesiser/Synthesiser.cpp#L353 for printing and reading files, but I don't see a way to access this programmatically. Am I missing something?

It also looks like the directive maps of all relations include all the ADT definitions? That seems like a lot of unnecessary duplication.

Should there be an ADTTable similar to the recordTable?

quentin commented 2 years ago

I am currently experimenting some kind of programmatic type descriptors in this branch: https://github.com/quentin/souffle/tree/libsouffle

The types of relations and values are made available through a TypeDesc object. A TypeRegistry allow to retrieve the type descriptor from a specified type name, or to retrieve the vector of all known types.

https://github.com/quentin/souffle/blob/1057b07f246698e064df1ef09fe0949377695d85/src/include/souffle/SouffleInterface.h#L420-L426

https://github.com/quentin/souffle/blob/1057b07f246698e064df1ef09fe0949377695d85/src/include/souffle/SouffleInterface.h#L835-L838

https://github.com/quentin/souffle/blob/324dc25b4391eaf0b7700b4c4e25999f5a67fcf6/src/include/souffle/SouffleTypes.h

aeflores commented 2 years ago

Nice! I also noticed that right now it does not seem possible to create tuples programmatically that contain ADTs. The definitions of << operators https://github.com/quentin/souffle/blob/1057b07f246698e064df1ef09fe0949377695d85/src/include/souffle/SouffleInterface.h#L586 do not really support ADT types, they check for types being s, u, f or i, so there is no way to populate a typle with ADT fields as far as I can tell.

quentin commented 2 years ago

You can construct ADT values using the records interface. ADT values are stored like regular records, but with some degree of nesting.

ADT values, like Record values, are numeric index references to the record-table. Those references are stored in relation tuples as a RamDomain column: https://github.com/souffle-lang/souffle/blob/e4184ab103bf4878d8fd85cb5373d690691bfe32/src/include/souffle/io/ReadStream.h#L112-L119

Take a look at readADT: https://github.com/souffle-lang/souffle/blob/e4184ab103bf4878d8fd85cb5373d690691bfe32/src/include/souffle/io/ReadStream.h#L133-L140

Enum ADT

The simplest class of ADTs are the enumeration: ADT where every branches are empty:

.type Animal = Dog {} | Bird {} | Cat {}

An Animal value is a simple RamDomain value representing the selected branch index. Branches are indexed from 0 in alphabetic order: https://github.com/souffle-lang/souffle/blob/15221792dfa841438f9aaddc39939c778ae98d58/src/ast/analysis/typesystem/TypeSystem.h#L282-L283

In this example $Dog() is equivalent to value 2, $Bird() is 0, $Cat() is 1.

This encoding is visible here in readADT: https://github.com/souffle-lang/souffle/blob/e4184ab103bf4878d8fd85cb5373d690691bfe32/src/include/souffle/io/ReadStream.h#L175-L177

Non-Enum ADT

The other class of ADTs are the non-enumeration: ADTs where at least one branch is not empty:

.type BinaryTree = 
    EmptyLeaf {} 
  | Node { left: BinaryTree, right: BinaryTree}
  | ValueLeaf { n : number } 

For non-enumerations, an ADT value is a RamDomain reference into the record table. The referenced record has exactly two fields that encode the ADT value, that I will call F1 and F2 in [F1, F2]. The first field F1 stores the selected branch index as described previously. The usage of the second field F2 depends on the form of selected branch.

If the selected branch has a single field, then F2 is set with the immediate value of that field. For instance $ValueLeaf(23) is encoded by record [F1, F2] = [2, 23]:

If the selected branch is empty, or has more than one field, a record is created in the record-table to store those fields value. And F2 is set with the reference to that record. For instance, $Node($ValueLeaf(23), $EmptyLeaf()) is encoded by record [1, F2] where F2 is reference obtained by packing the 2-fields record value [$ValueLeaf(23), $EmptyLeaf()].

The empty branch case is not different from the multiple-fields branch case. F2 set with the reference to the empty record. For instance $EmptyLeaf() is encoded by record [0,F2] where F2 is the reference to the zero-length empty record value. F2 is the value returned by packing the zero-length record: recordTable.pack(std::vector<RamDomain>(), 0).

This encoding is visible here in readADT: https://github.com/souffle-lang/souffle/blob/e4184ab103bf4878d8fd85cb5373d690691bfe32/src/include/souffle/io/ReadStream.h#L236-L245

aeflores commented 2 years ago

@quentin I still think you need this https://github.com/souffle-lang/souffle/pull/2183, or you won't be able to populate at tuple once you have the encoded value.

quentin commented 2 years ago

@aeflores Good point, actually I have been "cheating" by taking an iterator on the tuple with it = tuple::begin() and then storing arbitrary RamDomain values into it *it = adt_ref;. This bypass the type checking unlike <<.

b-scholz commented 2 years ago

Do you need this information in the Datalog program or outside of the Datalog program?

Indeed, the whole type description is dumped for the I/O system in JSON format.