statebox / cql

CQL: Categorical Query Language implementation in Haskell
GNU Affero General Public License v3.0
162 stars 14 forks source link

Allow identifiers to be quoted in parsing #64

Open wisnesky opened 5 years ago

wisnesky commented 5 years ago

example:

typeside Ty = literal { types int string constants "100" "150" "200" "250" "300" : int "115-234" "112-988" "198-887" Smith Jones "250" "300" "100" }

wisnesky commented 5 years ago

It turns out this isn't quite right - here's a counterexample that runs in aql-java:

typeside Ty = literal { types int string constants "100" "150" "200" "250" "300" : int "115-234" "112-988" "198-887" Smith Jones Sue Alice Bob : string options allow_empty_sorts_unsafe = true }

marcosh commented 5 years ago

@wisnesky it looks like it is the - breaking the parser. Should it be an allowed character for identifiers? Should it be allowed only if the identifier is quoted?

At the moment the only non-alphanumeric characters allowed for identifiers are _ and $. Is there any other character which should be allowed?

wisnesky commented 5 years ago

All strings should be identifiers when quoted. The reason is that identifiers are used for constant symbols, and we want constant symbols to include all strings, so that AQL can represent e.g., the type ‘String’.

On Oct 22, 2018, at 12:33 PM, Marco Perone notifications@github.com wrote:

@wisnesky it looks like it is the - breaking the parser. Should it be an allowed character for identifiers? Should it be allowed only if the identifier is quoted?

At the moment the only non-alphanumeric characters allowed for identifiers are _ and $. Is there any other character which should be allowed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

wisnesky commented 5 years ago

Here's another case that breaks, actually, with or without the quotes around the 1.

options
prover = program
program_allow_nontermination_unsafe = true
allow_empty_sorts_unsafe = true
timeout = "1"

wisnesky commented 5 years ago

Can we make it so that numerals don't need to be quoted? Here's the example that should run but doesn't:

instance I = literal : S {
         generators e : E
         options timeout = 1
}
marcosh commented 5 years ago

this should not be hard. But, just to make sure, do we really want that any identifier could be a numeral? This would mean that any variable, foreign key, entity, ..., could be called 23. Is this what we really want or do we need to make some distinctions? If this is the case, which is this distinction?

wisnesky commented 5 years ago

There’s no need to make any distinctions. Using numerals for identifiers happens in AQL all the time. For example, people often write 1 2 3 : Employee.

On Nov 15, 2018, at 1:57 AM, Marco Perone notifications@github.com wrote:

this should not be hard. But, just to make sure, do we really want that any identifier could be a numeral? This would mean that any variable, foreign key, entity, ..., could be called 23. Is this what we really want or do we need to make some distinctions? If this is the case, which is this distinction?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

wisnesky commented 5 years ago

Moreover, AQL java is the spec - we need to parse exactly the same programs it does.

On Nov 15, 2018, at 10:10 AM, Ryan Wisnesky ryan@catinf.com wrote:

There’s no need to make any distinctions. Using numerals for identifiers happens in AQL all the time. For example, people often write 1 2 3 : Employee.

On Nov 15, 2018, at 1:57 AM, Marco Perone notifications@github.com wrote:

this should not be hard. But, just to make sure, do we really want that any identifier could be a numeral? This would mean that any variable, foreign key, entity, ..., could be called 23. Is this what we really want or do we need to make some distinctions? If this is the case, which is this distinction?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

wisnesky commented 5 years ago

It may seem silly to allow "42" to be the name of an entity/fk/att/etc, but ok to allow "42" to be name of a constant symbol. But in fact, because AQL users can do things like pivot (convert rows to columns), you can easily get in to circumstances where 42 is an entity/fk/att/etc.

marcosh commented 5 years ago

I started out replicating the logic of the symbol parser in the antlr definition, but that seems to be not what it's actually needed

wisnesky commented 5 years ago

On closer inspection, Fred's ANTLR file is hit or miss. It handles

typeside t = empty
schema s = empty : t
instance I = literal : s {
         options timeout = 1
} 

correctly but doesn't handle

instance J = literal : empty : empty {
} 

Ground truth is in the java jparsec parser: https://github.com/CategoricalData/fql/blob/1f7bc30cfaff8d499b301ca5b52a492c0c934a4e/src/main/java/catdata/aql/exp/CombinatorParser.java