polygon-io / feature-requests

Feature requests for the Polgon.io Platform. Any bug reports should be opened in polygon-io/issues/issuess
0 stars 0 forks source link

Trade (tick) timestamps do not follow json conventions (causing overflow) #16

Open tr8dr opened 3 years ago

tr8dr commented 3 years ago

URL https://api.polygon.io/v2/ticks/stocks/trades/A/2010-01-06?reverse=false&limit=50000&timestamp=1262812349393999878&apiKey=...

Result Getting a results that on first glance looks textually correct, however the timestamps must be quoted, as json converts #s to double under the covers.

{"results":[{
    "t":1262812349394000000,  <- note the lack of quotes in the original
    "q":2737306,
    "i":"",
    "x":4,
    "r":12,
    "s":100,
    "c":[10,12,2],
    "p":30.8668,
    "z":1}],  
    ...
}

This causes a timestamp like: 1262812349394000000 to be converted to a double, where the mantissa is such that does not have full resolution on the #. The stamp becomes: 1262812349393999878 once it goes through the double conversion (close but not exact).

For better or worse, the time stamp must be quoted. So instead should be:

{"results":[{
    "t":"1262812349394000000",   <- note the quotes
    "q":2737306,
    "i":"",
    "x":4,
    "r":12,
    "s":100,
    "c":[10,12,2],
    "p":30.8668,
    "z":1}],  
    ...
}

Expected Result Timestamps must be quoted for proper json parsing.

Additional Notes Javascript and many JSON parsers represent the numeric type as a double. This means that most parsers will return the wrong timestamp value (out to some resolution) due to the reduced resolution of 52 bits in the IEEE floating point representation. By convention most market data providers streaming JSON will either:

I think the most compact form would be to continue to use the epoch time in nanoseconds, but quote it so that Javascript, Java-based JSON parsers, and other parsers following the Javascript convention of holding numeric in a double do not lose resolution when parsing. This would present a minor format change and might break some code, however. APIs that expect a numeric value rather than a string-based-long might fail to parse the messages without a minor adjustment to convert the string to a long.

spazmodius commented 3 years ago

JSON does not "convert" anything, it's just a data format. Large, high-resolution numbers are totally valid JSON, but your language and/or parser may have limitations.

I use Javascript, and it is true that JSON.parse() does not give the same t value. I find that t / 1e6 yields an acceptable ms-resolution time. If I wanted to retrieve the exact ns-resolution timestamp, I'd have to use a parser with some kind of flexibility.

tr8dr commented 3 years ago

JSON does not convert, is just a format, that is true, However because JSON originated from Javascript, and Javascript has this quirk, quite a few parsers (in the JVM, .NET, etc) also represent the JSON parsed numeric type with a double. Sadly all of the high-resolution numeric types I output in json have had to be quoted over the yrs to avoid this issue.

I think for the sake of the install base of parsers out there it makes sense to quote.

I am using Kotlin and have checked a number of JSON parsers, all of which follow the very flawed behavior of Javascript. To work around this problem wrote my own parser, and then lazy evaluate to a long, double, int or whatever the expected type is.

tr8dr commented 3 years ago

One more note, the reason why this is important (even if not interested in ns resolution) is that the full timestamp is used as a paging cursor. if there are more than 50K trades / day, one needs to set the starting timestamp to be 1ns past the last timestamp received in the prior "page".

spazmodius commented 3 years ago

the full timestamp is used as a paging cursor

Yes 😞 , yes it is. My code works around that by counting how many records were in the last ms of the previous page, and skipping that many on the next page.

kmcminn commented 3 years ago

tested, happens in golang also. python and javascript use 64bit integer types.

1) there is a bug that needs to be fixed so that timestamps are never > 13 digits. 2) milli-timestamps is a feature, please dont change it polygon 3) @tr8dr just change your json decoder to use a 64 bit integer for large numbers. golang supports this, I'm sure kotlin has the same

spazmodius commented 3 years ago

64bit integer types

Surely you mean 64-bit floating point types?

tr8dr commented 3 years ago

@kmcminn I don't currently have a problem in dealing with this in my code; However to handle these messages I had to write a new JSON parser to deal with the fact that any number of JSON parsers out there use double to hold a parsed numeric value. I investigated a number of JVM parsers only to find that they all had this flaw. I've seen the same on the .NET side as well. It's pretty "retarded", but I think was just adopted from Javascript's implementation.

kmcminn commented 3 years ago

@spazmodius no, the integer types that json decoders most people use in python and javascript have to handle integers greater than 32 bits. the types just have some code under the hood to switch to a 64 bit type. The loss of precision when switching:

1262812349394000000

to a 64 bit floating point number is a language problem, its a general computation problem.

tr8dr commented 3 years ago

@kmcminn The problem here does not related to 32 vs 64 bit, rather that passing a long or string-version-of-a-long into a double and back into a long loses resolution. This happens due to an artifact started on the Javascript side and carried forward into various JSON parser implementations.

spazmodius commented 3 years ago

@kmcminn I'm not sure I know what you're saying. That number would easily fit in a 64-bit int, which Javascript doesn't have.

kmcminn commented 3 years ago
number of JSON parsers out there use double to hold a parsed numeric value

in the case of 1262812349394000000 this can't be represented accurately using most standard 64 bit floating point types. the fact that some parsers use a floating point type when choosing to deserialize this number is an implementation detail sometimes motivated due to the absence of a 64bit integer type. python gets around this in their parser by selecting a 64bit integer type to represent this number. newer versions of nodejs are probably doing something similar for JSON.parse.

not related to 32 vs 64 bit, rather that passing a long or string-version-of-a-long

yeah to clarify, any >=13 digit number requires more than 32 bits of precision to store it. its a 32 v 64 implementation detail however in the json decoder as the decoders logic will be based on output of type inference to some degree, which will select a determined 64bit type. in cases where that selects a double you are pretty much screwed as that number can't be precisely represented with a double in any language.

tr8dr commented 3 years ago

@kmcminn Yes, that is my point. What I am indicating is that many JSON parsers use floating-point representation for numeric types, hence to work around this, large numeric values sent in a JSON stream should be quoted. Ideally would like to see Javascript, and the many other JSON parsers for JVM and other environments not follow Javascript's folly, but until then ...

Re FP, though double is 64 bit, the mantissa component is just 52 bits. There are other aspects of floating-point representation which can introduce inaccuracy, even for large integers within the 52 bits of magnitude.

kmcminn commented 3 years ago

lets dive into a solutions:

polygon quotes these numbers as strings in all APIs

we'll now have to explicitly cast these numbers ourselves to a type which wouldnt be terrible but the migration would no fun for all involved

polygon changes to a custom integer type that can be represented by a 32 bit integer (think octal timestamps)

dont like this migration story either or having to rely on a difficult time conversion in their code and the error-prone blackbox it creates for them and their custoemrs

polygon switches to 8601 timestamps

this is probably the more robust path in that we get a string type that can be accurately represented everywhere and that also is the most intuitive for customers. migration woes. the history on this diccussion in polygon would be interesting.

polygon fixes all apis to have have 13 digits timestamps consistently and publishes json decoder implementation details documentation for all languages

probably the lowest impact and easiest to accomplish

tr8dr commented 3 years ago

I agree that any change in format will break code. That said note the following, other data sources (such as crypto exchanges), provide timestamps in one of two forms:

Epoch time is more compact and cheaper to parse, so I think quoted epoch is the better of the two from efficiency standpoint. More compact bandwidth use and cheaper parsing.

Probably this "bug" should resolve into a feature request in a future API release.

kmcminn commented 3 years ago

hence to work around this

yeah I feel your pain. I'm working with it using 64bit integer types that can accurately represent this. I've not seen a timestamp > 13 digits on any websocket feeds.

    string representation of date/time iso 8601 OR
    quoted epoch in ms or ns

Probably this "bug" should resolve into a feature request in a future API release.

indeed the ability to declare number and string serialization options would be elite. I would want it on all APIs...

one quick fix polygon can do on their side which would help golang, java, kotlin, .net users, would be to just stop sending 19 digit milli-timestamps. numbers that big get wrecked by double see:

$ python ./test_ts.py 
--> start=1262812349394000000 incrementing 100000000 time(s)
--> done, 100000000 iters, found 99609375 imprecise numbers

--> start=1262812349394 incrementing 100000000 time(s)
--> done, 100000000 iters, found 0 imprecise numbers
def check_doubles(start, num):
    print(f"--> start={start} incrementing {num} time(s)")
    end = start + num
    i, n, c = start, 0, 0

    while i < end:
        as_float = float(i)
        compare = int(as_float)

        if i != compare:
            count += 1

        i += 1
        n += 1

    print(f"--> done, {n} iters, found {count} imprecise numbers\n")

check_doubles(1262812349394000000, 100000000)
check_doubles(1262812349394, 100000000)
kmcminn commented 3 years ago

can someone can port this to kotlin or java to see if you see similar?

spazmodius commented 3 years ago

@kmcminn , I feel like you're on a tangent. That doesn't test anything about JSON deserialization, but rather your language and whatever datatype you're using.

@tr8dr is suggesting quoting the value in the JSON payload to move parsing from the JSON-parser to the client.