souffle-lang / souffle

Soufflé is a variant of Datalog for tool designers crafting analyses in Horn clauses. Soufflé synthesizes a native parallel C++ program from a logic specification.
http://souffle-lang.github.io/
Universal Permissive License v1.0
916 stars 207 forks source link

Finalise the new typesystem branch. #1039

Closed azreika closed 4 years ago

azreika commented 5 years ago

The typesystem is currently well into being completely revamped. The current status of these changes can be seen in the new new-typesystem branch. I'm creating this issue as a one-stop location for all TODOs planned for that new branch at the moment. Feel free to comment any suggestions or enquiries below. Order is roughly planned timeline.

-- Stage 1 -- Implementing the new type-system

-- Stage 2 -- Changing syntax and semantics to work well with new type-system

-- Stage 3 -- Changing type declaration syntax

-- Stage 4 -- Implementing record printing [azreika/record-maps branch]

-- Stage 5 -- Implementing record polymorphism

-- Stage 6 -- Writing up the documentation

-- Extensions -- Other considerations for typesystem improvement in the future, not covered by this issue

azreika commented 5 years ago

The plan is also to eventually allow other primitive types - potentially int, uint, float, symbol, and string?

b-scholz commented 5 years ago

Yes - that would be wonderful to have int/uint/float/symbol.

Herbert would like to differentiate between symbols and strings.

A symbol cannot be altered - it is immutable whereas operations such as strcat are permitted for strings.

For me, uint/int/float/symbol is more important as a first step. In a second step, we can differentiate symbol and string.

azreika commented 5 years ago

For simplicity, current focus will be on two primitive types,int (to replace number) and symbol, with the intention of easy extension in the future to allow float, uint, and string. I think some extra refactoring will be necessary, because the assumption that only two types exist (number and symbol) is heavily built-into the code.

azreika commented 5 years ago

Want to potentially implement proper functor polymorphism in the future, e.g. to allow ord, +, to work correctly based on the argument types provided.

azreika commented 5 years ago

Printing out records will need the record map to be passed to the output engine (WriteStreamCSV, etc.). This will probably require RecordMap to be unified across the synthesiser and interpreters, similar to the SymbolTable.

Also, record polymorphism means that records in the same tree of unions require unique identifiers, which is currently not handled (at present we only guarantee unique record identifiers for records of a fixed arity). Current thought process is to give all records within a union tree a 'virtual' arity which equals the maximum real arity of any record in the union tree, and store them in the record map with that padded arity. Catch is we would need to store the real arity as an extra field in each stored record tuple.

However, we also need to store the type id with each record tuple to allow record printing later on. Current idea is to add type-id and arity as one extra field in each record tuple - 6 bits for arity (since we only allow a maximum arity of 64 fields per record type at the moment), and the remaining 26 bits for the type-id.

azreika commented 5 years ago

Record map unification between the synthesiser and interpreters does not seem feasible without compromising the fine-tuned performance of the record maps in the compiled version, even when records are not printed.

New idea is to have an intermediate record table being passed through to the stream-writers if records are chosen to be printed in the datalog file, where all relevant tuples in the table are copied. This means there is overhead for record map copying, but the vision is that this will only be triggered when records are printed (and only potentially-printed records are copied over).

So far, single-level record printing seems to be working using this method for both the interpreted and compiled version.

Here's a sneak peak:

.type Car = [license:symbol, make:symbol, model:symbol]
.type Person = [name:symbol, id:number, car:Car]

.decl customer(person:Person, username:symbol, pin:number)
customer(Person["AZ", 67, Car["XYZ123", "Toyota", "Corolla"]], "azreika_souffle", 1234).

.decl result(person:Person, str:symbol)
result(Person[name, id, car], str) :-
    customer(Person[name, id, car], username, pin),
    str = cat("<< ", name, "(username: ", username, ", PIN: ", to_string(pin), ") >>").

.output result()

Currently have it printing:

result
person  str
===============
RECORD<2>: [0, 67, 1]   << AZ(username: azreika_souffle, PIN: 1234) >>
===============

Goal is to get it printing:

result
person  str
===============
Person[AZ, 67, Car["XYZ123", "Toyota", "Corolla"]]  << AZ(username: azreika_souffle, PIN: 1234) >>
===============
josephcappadona commented 5 years ago

Hi, given that it seems float support is still a ways away, what are my options (if any) for using floats and floating point arithmetic? The lack of such support makes working with Souffle a bit awkward for my use case.

b-scholz commented 5 years ago

Can you open a new issue asking how to have floats in the current version? I have a solution for you. Let's keep this thread related to the new type system.