mit-pdos / noria

Fast web applications through dynamic, partially-stateful dataflow
Apache License 2.0
4.98k stars 242 forks source link

Support for UnsignedInt & UnsignedBigInt #121

Closed alexsnaps closed 5 years ago

alexsnaps commented 5 years ago

This is nowhere close to be merge'able, but I thought I'd use it to get the discussion going:

While trying to see how Noria would workout on a project, I hit an issue where unsigned BigInt(20) column types weren't properly supported. nom-sql had issues parsing the SQL while trying to load it all in Noria and threw ParseIntError { kind: Overflow } at us.

I started hacking trying to get something working to at least get to load the the mysql dump into noria and make progress with the testing. Now here I am… msql-srv seems to deal with it fine (based on my limited reading of the code), but nom-sql less so. I'm more than willing to make progress on this, but would love to know what y'all take is on adding the DataType::Unsigned(Big)Int is. Or would it be nicer to treat these as properties of the DataType itself, just like the "width"? msql-srv looks at it as flags on the column definition… Also what about the other "integer" types MySQL supports?

Anyways, I thought I'd get the discussion going already. In the meantime I'll try to make progress here… somehow. Tomorrow tho probably :)

wdyt? Or did I miss something entirely?

ms705 commented 5 years ago

This is an entirely reasonable question! :+1:

We can help with nom-sql parse failures if you let us know what queries it barfs at. We certainly would love it if there was a utility to read MySQL dumps into Noria!

One perhaps important thing to note is that DataType exists in order to allow Noria's operators to be generic over the data they consume and store in their state. It's a bit of an imperfect solution, but since the dataflow changes dynamically as queries arrive, the precise column types aren't known at compile time. One constraint for doing fancy things with DataType (like adding signedness as a property) is that the memory size of the DataType enum is pretty crucial to Noria's memory footprint: every record cell value in Noria uses at least sizeof(DataType) bytes, so adding additional enum variants or levels of enum indirection can come at some cost.

alexsnaps commented 5 years ago

@ms705 I've just opened a PR on ms705/nom-sql to only realize it's your repo 🤦‍♂️ See https://github.com/ms705/nom-sql/pull/40 Anyways, that's the work I had done (plus on a v0_0_4 branch to avoid bumping noria) to support unsigned types when parsing the SQL. If we can get that in a reasonable fashion for merge, we can then see how to move this PR forward. wdyt?

alexsnaps commented 5 years ago

With ms705/nom-sql#40 merged, I'll rebase and update this PR

alexsnaps commented 5 years ago

Alright, I think this "only" misses the arithmetic operators on all the new permutation. I'll add these this week sometime 🙈. But I am missing something else? wdyt?

And probably using an actual nom-sql release, rather than using master from the repo....

ms705 commented 5 years ago

Thanks for coming back to this! :+1:

Released the updated nom-sql as v0.0.8. I'll take a look at the changes today or tomorrow.

ms705 commented 5 years ago

Thanks, looks good to merge to me once CI has finished.