querycert / qcert

Compilation and Verification of Data-Centric Languages
https://querycert.github.io/
Apache License 2.0
55 stars 9 forks source link

SQL++ to NRAEnv Translation #56

Open joshuaauerbachwatson opened 6 years ago

joshuaauerbachwatson commented 6 years ago

This is a sub-issue of issue #10.

I am using it to record semantic issues that come up in writing the sqlpp_to_nraenv_top function and its subfunctions.

joshuaauerbachwatson commented 6 years ago

We current define sqlpp as list sqlpp_expr but this is actually counterproductive. It is modeled on what we did in SQL, where there are additional statement types besides queries ("create view" and "drop view"). Note that in SQLtoNRAenv there is the expectation that there can be multiple statements, but (generally) only one query statement. If there is more than one, only the last affects the result. The sexp translation for SQL also converts 'with' clauses into separate 'create view' statements.

Note that there is no create view statement in SQL++ but there is a much richer repertoire of 'with' and 'let' clauses that can bind variables to collections (or to anything, actually) with varied scopes.

joshuaauerbachwatson commented 6 years ago

I've simplified by redefining sqlpp as sqlpp_expr rather than a list. This has been drilled up through the front-end logic. Still no actual contents in the NRAenv translator.

joshuaauerbachwatson commented 6 years ago

Some of the more obvious clauses in the SQLPPtoNRAEnv are now populated. The biggest issue (select) is still pending. Also, even in the case of simpler expressions there are many open issues, indicated as TODOs in the code.

The following is not prioritized, just listing the issues.

  1. SQL++ has two distinct values for unknown data: null (generally occurring when a JSON object has an explicit null value) and unknown (generally occurring when a field of a JSON object is not present but is referenced). In our data model, we can represent null but we don't have a distinct value for unknown. Also, we don't contemplate that dunit can take on any type, the way null and unknown can in SQL++. For object references in the past we have used optional types to model things that can be null. Nothing precludes using sum types more generally to model things that can have two different kinds of absence as well as being present with a value. However, this will complicate translation in a fairly pervasive way.

  2. SQL++ has both bags and (ordered) lists but our data model has only bags. We actually use Coq lists to implement bags and I think in some places we cheat and take advantage of order. But, to be correct, we really need two distinct first-class collection kinds.

  3. SQL++ has a first-class exponentiation operator. We have no equivalent base operator but we do have exponentiation in our floating point model part. This is probably not a serious issue but we do have to assume that floating point is present in order to translate SQL++ and we have to do whatever is needed to coerce to and from floating point in order to implement this operator.

  4. The like operator in SQL++ is a binary operator between expressions; that is, the pattern does not have to be a literal but can be computed. Our internal like operator is unary and takes the pattern as a (string) parameter. We probably need a binary like operator either in the base or as a foreign operator.

  5. SQL++ has a (JSON) object constructor in which both field names and field values are expressions. Our internal record type (the obvious translation target) requires static field names.

More generally, we probably won't fix all of these mismatches right away or even ever and so we need a clear statement of what features of SQL++ we don't support. Ideally, these usages will be filtered out with diagnostics in the front-end so the user doesn't get too confused.

joshuaauerbachwatson commented 6 years ago

We now have translations for many expressions. Still not implemented but planned:

I distinguish normal indexing from 'index any' because normal indexing will require solving the lists vs bags issue but 'index any' could be implemented without doing that. The semantics are a random selection from a collection (so order doesn't actually matter). On the other hand, 'index any' requires getting some form of randomness into Coq. I think it would be adequate to have a (not very good) pseudo-random function, which could be implemented in Coq, with the seed being a single global input to the computation that is then passed around and updated. A pain threading it through but otherwise straightforward.

Also, among things that are already translated, the following restrictions:

I would say the two largest issues requiring contributions from others are

If the bag vs list distinction remains unaddressed for a while, I might consider ruling out normal indexing in the front end; I don't think we can rule out the 'array' constructor because it is probably used a lot rather than the bag constructor and often user's don't really care about the order. I think we just need to document what semantics we support.

If we are unable to address null vs missing and use of sum types for a while, we might also impose a restriction that rules out all forms of missing data (including null) but that will be perceived as a huge restriction eliminating some of the advantages of SQL++ for semi-structured JSON data.

joshuaauerbachwatson commented 6 years ago

Previous enumerations of progress and issues have left out the interesting case of function calls. The design I'm assuming is the following:

I am partly along implementing this design. Functions that initially populate this set are

jeromesimeon commented 6 years ago

It would be nice to have a well-written mechanized semantics for SQL++. This would go hand-in-hand with a proof of semantic preservation for this translation (See #74).

jeromesimeon commented 6 years ago

I've just noticed the following Coq warning:

File "./coq/Translation/SQLPPtoNRAEnv.v", line 40, characters 0-6785:
Warning: Not a fully mutually defined
fixpoint
(sqlpp_to_nraenv depends on sqlpp_select_statement_to_nraenv but not
conversely).
Well-foundedness check may fail unexpectedly.
 [non-full-mutual,fixpoints]

It indicates that sqlpp_select_statement_to_nraenv could be a separate function. This may be a temporary thing, due to the current state of the translation, but I thought I would record it.

joshuaauerbachwatson commented 6 years ago

Commenting on the previous comment from @jeromesimeon: the sqlpp_select_statement_to_nraenv function is currently a highly incomplete stub but it is intended ultimately to be mutually recursive with sqlpp_to_nraenv (it will need to be). So, yes, it is a temporary thing. If the warning message is judged to be annoying, I could comment out the function entirely and stub out the call to it.