uwescience / raco

Compilation and rule-based optimization framework for relational algebra. Raco is the language, optimization, and query translation layer for the Myria project.
Other
72 stars 19 forks source link

Blob literal: adding blob literal to support BLOB data type in Myria #562

Closed parmitam closed 7 years ago

parmitam commented 7 years ago

I have a long string of changes here, please let me know if you need me to sqash for review. Overview:

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.02%) to 91.711% when pulling 2ef08816aa3318f7a5f96c3f9fe9c60cf71ccd1a on blob_literal into 5446f6d614f5424f4839a833d0b7002a9bc6d793 on master.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.02%) to 91.711% when pulling 2ef08816aa3318f7a5f96c3f9fe9c60cf71ccd1a on blob_literal into 5446f6d614f5424f4839a833d0b7002a9bc6d793 on master.

parmitam commented 7 years ago

Any comments? Can we push this through?

BrandonHaynes commented 7 years ago

This implementation conflates NULL with a zero-length byte array. I'm not in favor of introducing nullability into Myria as a special case for one attribute type.

parmitam commented 7 years ago

I am not sure I understand the issue with allowing a blob-type variable (not even column) being null. Could you help me understand why this is a problem? It never gets stored in a column, it is the initial state of a variable used in MyriaL UDA. I am not using the null keyword anymore

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.02%) to 91.712% when pulling b348f0d0e274d5984ce5d9ae80dcb34ab77959b4 on blob_literal into 5446f6d614f5424f4839a833d0b7002a9bc6d793 on master.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.02%) to 91.707% when pulling 6cf54caf45c87082613d3265fac998ae6ea4f67f on blob_literal into 5446f6d614f5424f4839a833d0b7002a9bc6d793 on master.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.02%) to 91.706% when pulling 2eb42173bedcd1a6212c08dcfe440536e97c2aae on blob_literal into 5446f6d614f5424f4839a833d0b7002a9bc6d793 on master.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.02%) to 91.713% when pulling 2eb42173bedcd1a6212c08dcfe440536e97c2aae on blob_literal into 5446f6d614f5424f4839a833d0b7002a9bc6d793 on master.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.02%) to 91.711% when pulling c7269f411b36536f4a9d2a906635b4fdec190fd7 on blob_literal into 5446f6d614f5424f4839a833d0b7002a9bc6d793 on master.

senderista commented 7 years ago

I now think we should just use the empty blob literal b'' (assuming we've settled on the Python-style byte literal syntax described in PEP-3112), and add a check to compile_expr() that op.value is the empty literal (until we add general blob literal support). That seems to me to be the cleanest and most forward-compatible solution (i.e., assuming that we will eventually support blob literals as the value of a blob type anywhere in MyriaL).

parmitam commented 7 years ago

How would you distinguish between b'' and empty string in python?

BrandonHaynes commented 7 years ago

Slowly starting to catch up on my backlog :)

Agree with @senderista that b'' is the best literal syntax here, and that it's fine to not (yet) support arbitrary byte string literals. I'd also be comfortable with an implicit conversion from '' to b'', but am not sure how much work that would involve. (Assuming that this is still an open discussion, isn't it possible to write a UDA that emits its initial null state? If a null value ever leaks into a relation, we are then forced to make operators null-aware.)

Assuming we want to live in a Python 2 universe, we will probably want to use the unicode type for strings and bytes for byte strings. basestring and bytearray is another option, but I prefer that the byte strings be immutable.

parmitam commented 7 years ago

there seem to be two issues here: a) value null of the variable ending up in the relation b) using a keyword to specify and empty bytearray/bytes/b'' which is indistinguishable from an empty string when expressed in MyriaL. For a) I'll change it so that we never send null to the relation -- @BrandonHaynes is right, it could end up in a relation and we might need to change more to ensure everything can handle null values. For b) I am not sure that we can do this without a keyword. if we are agreed that we can leave the arbitrary byte string literals for later, I'll make the change specified in a) and get this change in for now.

BrandonHaynes commented 7 years ago

I don't understand why we cannot distinguish b'' from '' in MyriaL.

parmitam commented 7 years ago

b'' parses as empty string _LexToken(STRINGLITERAL,'',3,38) a regular expression specified to look for a prepended b causes errors in lexer as well. If you have better ideas, lmk.

BrandonHaynes commented 7 years ago

I implemented byte string parsing using the b'' syntax. Do you want me to push it to a separate branch?

senderista commented 7 years ago

Let me know what you think about the b'\xFF' format I implemented. It's obviously more verbose than just using the unescaped hex codes like b'FF' (but is Python-compatible).

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.07%) to 91.663% when pulling e2ce1ad9a307264c651fb024697d7a0b70384205 on blob_literal into 5446f6d614f5424f4839a833d0b7002a9bc6d793 on master.

BrandonHaynes commented 7 years ago

Yeah, this is essentially identical to my implementation. Might want to support double-quotes as well since ordinary strings do.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.07%) to 91.663% when pulling 9029d23c8a389514fbb31fc9b80261df2f478a02 on blob_literal into 5446f6d614f5424f4839a833d0b7002a9bc6d793 on master.

parmitam commented 7 years ago

This works end-to-end.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.07%) to 91.663% when pulling 85a1cfba3993574945421cb0ae09a9e3b480086f on blob_literal into 5446f6d614f5424f4839a833d0b7002a9bc6d793 on master.

senderista commented 7 years ago

@parmitam @BrandonHaynes any reason not to merge this (and the MyriaX blob_literal branch) into master? (I still have to fix the new integration test for BITSET expressions in MyriaX, but I guess I can leave that test out for now.)

BrandonHaynes commented 7 years ago

Looks fine to me, modulo the stray debugging print statements.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.07%) to 91.657% when pulling 44db981badb58148a5e11809dc51c69a55069f85 on blob_literal into 5446f6d614f5424f4839a833d0b7002a9bc6d793 on master.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.07%) to 91.658% when pulling 44db981badb58148a5e11809dc51c69a55069f85 on blob_literal into 5446f6d614f5424f4839a833d0b7002a9bc6d793 on master.

senderista commented 7 years ago

fixes https://github.com/uwescience/raco/issues/555