yandex / ozo

OZO is a C++17 Boost.Asio based header-only library for asyncronous communication with PostgreSQL DBMS.
PostgreSQL License
227 stars 46 forks source link

Any example on how to write numeric type into db? #273

Open tangyan opened 3 years ago

tangyan commented 3 years ago

In pg/types it seems there is no binding for numeric type. It would be nice if there is an example of how to do it. Or have something in unit tests as an example.

thed636 commented 3 years ago

Hi! Nemeric type, unfortunately, is not adapted in the library yet. But it may be adapted externally via specializaition of ozo::send_impl, ozo::recv_impl and ozo::size_of_impl. In this case it is necessary to know the binary representation of the type.

neogenie commented 3 years ago

It's not hard to add decimal/numeric support to the library. The main question is: will we serialize the numeric into a string, which in my opinion is more universal, but less performance, or will we use some decimal library from boost or intel? @thed636

thed636 commented 3 years ago

Hi, Neo! Well, IMO it is better to use something like boost::multiprecision::cpp_dec_float. Just to have all the math out of the box without additional extra dependencies.

neogenie commented 3 years ago

@thed636 boost::multiprecision implements floating point arithmetic, which may be unacceptable for financial and accounting systems. The internal database implements true decimal, thus avoiding loss of precision. I suggest eliminating any dependencies at all and letting the library user decide for himself how to use the results. As a basic interface, we can offer conversion from / to a string and access to the internal representation, if you need to convert this type to the format of the required library without serialize/deserialize. It looks like it would be more correct in terms of division of responsibility between the library and the calling code.

I'll try to sketch out a PR with implementation.

thed636 commented 3 years ago

Well, we already provide a customization point to allow to adapt any type via send_impl/recv_impl/size_of_impl customization. So users may adapt any type they want to the library. I suggest the Boost.Multiprecision type because of the one that may be used as a number with all the math out of the box. Of course, if the type is not good for a user there is an ability to use any other type, and the boost::multiprecision::cpp_dec_float adaptation will be an example of how to do it.

As a basic interface, we can offer conversion from / to a string and access to the internal representation

This is not the best idea, because a user always may cast a numeric type to a text in a query to get a text representation of numeric. So at this point, such a solution is a little bit pointless.

I just want to say that boost::multiprecision::cpp_dec_float adaptation does not deny any other types' adaptations and usage. But it is not a good solution to provide a text-like representation of numbers without any math support, because it is useless in most cases.

neogenie commented 3 years ago

I agree that out-of-the-box math in boost::multiprecision::cpp_dec_float looks tempting.

But. The problem with using this library is that it is float with the specified accuracy in the template parameter. PostgreSQL implements the true decimal. The internal format of database and libpq determines the precision and number of characters on the fly in runtime, so it is not clear what to use as the internal type. Using the number of characters with a margin - can lead to an unwanted overuse of memory (albeit on the stack) and still does not guarantee that we will be able to save the corresponding number, and using a small number of characters will lead to loss of accuracy and unusability of the library.

My point would be that you can provide a basic decode / encode from a binary protocol in the lib and give the ability to serialize / deserialize this type into a string for tests, or, if we do not need to perform any calculations and give the user access to the internal representation for converting to a required type, be it a boost::multiprecision or some other library for working with decimal.

It looks like it makes no sense to add some implementation to the library and leave it to the client's choice.

systocrat commented 1 year ago

Hey, adding onto this issue to see if there is a workaround.

I'd like to serialize numeric types using the string format on the wire so I don't have to convert between the mpdecimal binary format and the Postgres binary format. As far as I can tell, many other popular libraries use the text representation, but it doesn't seem like ozo has a customization point which would allow me to explicitly tell libpq that a particular type is meant to be serialized/deserialized in the text rather than binary format.

Should I just dive in and start writing a proper conversion function or is there an easier way?

tangyan commented 1 year ago

What I did to solve this problem at that moment was to copy Postgres internal binary format codes and use that as the serialization. I vaguely remembered it's about 2 - 3 files and modified some of the logic to strip out the codes.

systocrat commented 1 year ago

@thed636 Hello! I have a nearly working example implementation here that I'd like to post, but I'm struggling with size_of_impl for writing decimal values to the database.

The binary wire representation of a numeric looks like this:

    struct pg_numeric {
        BOOST_HANA_DEFINE_STRUCT(pg_numeric,
                                 (std::int16_t, ndigits),
                                 (std::int16_t, weight),
                                 (std::uint16_t, sign),
                                 (std::uint16_t, dscale));
    };

Followed by an array of signed shorts with length ndigits

Is there a clean way to do this?

systocrat commented 1 year ago

Here's a gist with my WIP implementation- It's not pretty but deserialization works with many cases I tested querying from Postgres, and serialization works with a couple of cases that I tested manually. This uses libmpdec++ rather than boost::multiprecision::cpp_dec_float as its numeric class of choice.

https://gist.github.com/systocrat/036f3fe7e91d14530247dee666d456a6

All that really needs to be implemented is the size_of_impl which I'm still unclear on how to do for types like this where I'm converting between two highly incompatible representations of numeric data.

The table used in the example is defined as follows:

CREATE TABLE public.ntable (
    id SERIAL PRIMARY KEY NOT NULL,
    num numeric
);