ontodev / valve.rs

A lightweight validation engine written in rust.
BSD 3-Clause "New" or "Revised" License
4 stars 0 forks source link

Consider using SQL `format()` to help roundtrip numbers #88

Closed jamesaoverton closed 3 months ago

jamesaoverton commented 6 months ago

Related to #42, we need to be able to roundtrip tables with numbers. For example, in the penguins test the "culmen_length" column has datatype "positive_decimal", which expects a decimal point and has SQL type "NUMERIC". The penguins.tsv has values like "50.0". By default, SQLite will display this as "50", so VALVE save_table() will save it as "50", which will then fail to validate if loaded again.

SQL provides a format() function, based on printf. I find printf format strings ugly and unintuitive, but they are widely used, widely supported, and quite flexible. There might be some variation, however, which could cause problems.

We should consider allowing a "format" column for the "datatype" table that would specify a SQL format string, then use it in save_table().

First I'd just like to hear your thoughts @lmcmicu.

lmcmicu commented 6 months ago

I've run into this issue before and it is annoying. I think that adding a format column to the datatype table makes a lot of sense. I assume it will be optional?

lmcmicu commented 5 months ago

Unfortunately the syntax of format() is very different in sqlite and postgresql. While sqlite supports more or less the same options as C's printf function, PostgreSQL does not, instead only supporting a very limited syntax which is not very useful for the purposes of this issue.

However, there is a rust equivalent of C's printf function that is packaged in a separate crate: https://docs.rs/sprintf/latest/sprintf/. The advantage of using rust is that we avoid having to worry about syntax differences between different database engines.

jamesaoverton commented 5 months ago

Ok, good to know. I was hoping to push this down into the SQL layer, including the *_text views, so that users could just make SQL queries instead of running our Rust code. But it's more important for our format configuration to be consistent across SQLite and Postgres.

If we're going to use Rust, then I'd like to consider alternatives to printf syntax, which I find inscrutable.

jamesaoverton commented 5 months ago

After discussing this on our call, I'm convinced that printf syntax is the best option. Despite being ugly, it's widely supported, which means people can use the formatting configuration from SQLite or Python or whatever.