Open joshuaauerbachwatson opened 6 years ago
We current define sqlpp
as list sqlpp_expr
but this is actually counterproductive. It is modeled on what we did in SQL, where there are additional statement types besides queries ("create view" and "drop view"). Note that in SQLtoNRAenv there is the expectation that there can be multiple statements, but (generally) only one query statement. If there is more than one, only the last affects the result. The sexp translation for SQL also converts 'with' clauses into separate 'create view' statements.
Note that there is no create view statement in SQL++ but there is a much richer repertoire of 'with' and 'let' clauses that can bind variables to collections (or to anything, actually) with varied scopes.
I've simplified by redefining sqlpp
as sqlpp_expr
rather than a list. This has been drilled up through the front-end logic. Still no actual contents in the NRAenv translator.
Some of the more obvious clauses in the SQLPPtoNRAEnv are now populated. The biggest issue (select) is still pending. Also, even in the case of simpler expressions there are many open issues, indicated as TODOs in the code.
The following is not prioritized, just listing the issues.
SQL++ has two distinct values for unknown data: null
(generally occurring when a JSON object has an explicit null value) and unknown
(generally occurring when a field of a JSON object is not present but is referenced). In our data model, we can represent null
but we don't have a distinct value for unknown
. Also, we don't contemplate that dunit
can take on any type, the way null
and unknown
can in SQL++. For object references in the past we have used optional types to model things that can be null. Nothing precludes using sum types more generally to model things that can have two different kinds of absence as well as being present with a value. However, this will complicate translation in a fairly pervasive way.
SQL++ has both bags and (ordered) lists but our data model has only bags. We actually use Coq lists to implement bags and I think in some places we cheat and take advantage of order. But, to be correct, we really need two distinct first-class collection kinds.
SQL++ has a first-class exponentiation operator. We have no equivalent base operator but we do have exponentiation in our floating point model part. This is probably not a serious issue but we do have to assume that floating point is present in order to translate SQL++ and we have to do whatever is needed to coerce to and from floating point in order to implement this operator.
The like
operator in SQL++ is a binary operator between expressions; that is, the pattern does not have to be a literal but can be computed. Our internal like
operator is unary and takes the pattern as a (string) parameter. We probably need a binary like
operator either in the base or as a foreign operator.
SQL++ has a (JSON) object constructor in which both field names and field values are expressions. Our internal record type (the obvious translation target) requires static field names.
More generally, we probably won't fix all of these mismatches right away or even ever and so we need a clear statement of what features of SQL++ we don't support. Ideally, these usages will be filtered out with diagnostics in the front-end so the user doesn't get too confused.
We now have translations for many expressions. Still not implemented but planned:
I distinguish normal indexing from 'index any' because normal indexing will require solving the lists vs bags issue but 'index any' could be implemented without doing that. The semantics are a random selection from a collection (so order doesn't actually matter). On the other hand, 'index any' requires getting some form of randomness into Coq. I think it would be adequate to have a (not very good) pseudo-random function, which could be implemented in Coq, with the seed being a single global input to the computation that is then passed around and updated. A pain threading it through but otherwise straightforward.
Also, among things that are already translated, the following restrictions:
I would say the two largest issues requiring contributions from others are
If the bag vs list distinction remains unaddressed for a while, I might consider ruling out normal indexing in the front end; I don't think we can rule out the 'array' constructor because it is probably used a lot rather than the bag constructor and often user's don't really care about the order. I think we just need to document what semantics we support.
If we are unable to address null vs missing and use of sum types for a while, we might also impose a restriction that rules out all forms of missing data (including null) but that will be perceived as a huge restriction eliminating some of the advantages of SQL++ for semi-structured JSON data.
Previous enumerations of progress and issues have left out the interesting case of function calls. The design I'm assuming is the following:
I am partly along implementing this design. Functions that initially populate this set are
coll_avg
and array_avg
and the SQL-92 compatible restricted aggregation functions such as avg
which are provided as sugaring macros in SQL++). get_year
substring
It would be nice to have a well-written mechanized semantics for SQL++. This would go hand-in-hand with a proof of semantic preservation for this translation (See #74).
I've just noticed the following Coq warning:
File "./coq/Translation/SQLPPtoNRAEnv.v", line 40, characters 0-6785:
Warning: Not a fully mutually defined
fixpoint
(sqlpp_to_nraenv depends on sqlpp_select_statement_to_nraenv but not
conversely).
Well-foundedness check may fail unexpectedly.
[non-full-mutual,fixpoints]
It indicates that sqlpp_select_statement_to_nraenv
could be a separate function. This may be a temporary thing, due to the current state of the translation, but I thought I would record it.
Commenting on the previous comment from @jeromesimeon: the sqlpp_select_statement_to_nraenv
function is currently a highly incomplete stub but it is intended ultimately to be mutually recursive with sqlpp_to_nraenv
(it will need to be). So, yes, it is a temporary thing. If the warning message is judged to be annoying, I could comment out the function entirely and stub out the call to it.
This is a sub-issue of issue #10.
I am using it to record semantic issues that come up in writing the
sqlpp_to_nraenv_top
function and its subfunctions.