polygon-io / issues

Quickly track and report problems with polygon.io
29 stars 0 forks source link

Data field (timestamp, ID, SEQ) issues with returned Trade calls (Socket & Rest) UPDATED #233

Open kenaschmidt opened 1 year ago

kenaschmidt commented 1 year ago

See additional comments below for more clarity on this issue

URL Used for this example: "/v3/trades/O:TSLA230127C00125000?timestamp.gte=2023-01-17&order=desc&limit=10000&sort=timestamp" (other contracts same results)

Result 1) This query returns 4359 total trades 2) Of the total, there are 302 trades which cannot be uniquely identified by any provided fields - all values are identical. 3) Of the total, only 1010 contain a valid Sequence Number (the remaining 3349 return '0') 4) The trades which return a Sequence Number do not return a Participant timestamp, only SIP (those with no Sequence do include both timestamps. 5) No results return a Trade ID (id). 6) Other queries return similar results... not unique to this ticker.

I cannot find any documentation which outlines a reason for any of the above, or how to handle data harmonization under these circumstances (though I have to assume it's at least partly due to complicated options trade handling).

Expected Result / Questions 1) Why do no historical options trades carry a Trade ID to uniquely identify despite field inclusion?

EDIT: This was my error-No options data seem to return TradeIDs and it is not noted in the documentation that they should

2) Why do some trades carry a Sequence Number and others do not? Why does the inclusion of a Sequence Number cause the Participant Timestamp to be omitted?

EDIT: There is ambiguity in the documentation: REST trades are supposed to return a SIP and Participant timestamp, Socket trades simply return a 'Timestamp' without further definition. Live Socket trades are supposed to (and do) return a Sequence number 'q', but these are not listed as return values on the REST documentation... even though they ARE returned for recent data only.

3) How should duplicate trades be handled? Should each result returned be considered unique despite no differentiating information?

EDIT: Stock trades can be uniquely identified in either case (REST or Socket) based on the inclusion of TradeID, Sequence, and at least 1 timestamp.

-Socket Option trades can be uniquely identified by the inclusion of a Sequence number. -REST Option trades from 1-25 and later can be uniquely identified by the inclusion of a Sequence number and NS accuracy Participant timestamp. -REST Options trades from before 1-25 CANNOT be uniquely identified due to a lack of Sequence number, and only MS accuracy timestamps.

Screenshots CSV attached for above query

Desktop (please complete the following information):

jrbell19 commented 1 year ago

We'll have to do some digging around the questions posed here. Thank you for the information though.

The lack of tradeID and the inconsistent seq numbers may be a result of spec changes from OPRA, but I'm not sure.

kenaschmidt commented 1 year ago

Update-

Running some queries today and I am noticing a couple additional things which may help identify the issue:

  1. Based on my queries today, recent data only (today and yesterday) has a SIP t/s and Sequence number, but no Participant t/s. The SIP t/s is nanosecond accuracy.
  2. Data earlier than that (2 days ago+) has a SIP t/s and Participant t/s, but no Sequence number. Both t/s on this data are only millisecond accuracy
  3. Run on multiple options symbols across different underlying tickers, same results.
  4. No TradeID values on any values returned.

The t/s accuracy issue may explain the duplicate trades since there are no sequence numbers or trade IDs to differentiate sub-Ms ticks.

Can't find any issues with Stock query results, but every Option I have looked at returns the same results.

Image 1: Data from today and yesterday image

Image 2: Data from prior to yesterday image

kenaschmidt commented 1 year ago

Ran some more queries today and compared vs Socket stream... below is a summary of what I am seeing:

Edit 1: To correct language, for Socket trades the response object includes a TRF Timestamp, not SIP timestamp - no change to the values being returned though.

Edit 2:

Some of the issues I am experiencing are likely mapping issues on my end based on API documentation: the Stock->Socket->Trades page lists 'trft' and 'trfi' for return values under details, though the object does not show those are part of the JSON reply (Those values aren't on the Options page at all). Additionally, as noted below, Option->Socket->Trades return Sequence numbers, and Options->REST->Trades do as well but only for recent data (since 1-25 apparently). Requests prior to that return a 0 value (it is not noted in the API documentation as a return field at [all).]

I also see that none of the Options->Trades responses (REST or Socket) include a TradeID in documentation... this is obviously an oversight on my end, though I'd just ask if this is data that could be provided (if it's provided by the exchanges). I'll need to wait until Monday to do another review of Socket data returned to see if there are any other mapping issues on my end, etc.

Added several EDIT comments below.

STOCKS

Socket (Trade Ticks) -> Missing SIP timestamp for all trades EXCEPT those coming from exchange FINY. All timestamps are MS accuracy as expected (See figure 1).

Rest (Trade Tick Snapshot) -> All fields returned as expected. Both timestamps included, NS accuracy as expected (See figure 2).

OPTIONS

Socket (Trade Ticks) -> Missing SIP timestamp and TradeID for all trades. All participant timestamps are MS accuracy as expected(See figure 3) [EDIT: No TradeID is expected based on documentation]

Rest (Trade Tick Snapshot) Data on or after 2023-01-25 -> Missing PARTICIPANT timestamp and TradeID for all trades. SIP timestamp included, NS accuracy as expected (See figure 4). [EDIT: No TradeID or Sequence is expected based on documentation, though Sequence IS returned with this request, as well as with live Socket trades]

Rest (Trade Tick Snapshot) Data before 2023-01-25 -> Missing TradeID and Sequence for all trades. Both timestamps included, but only MS accuracy, not NS as expected (See figure 5). [EDIT: No TradeID or Sequence is expected based on documentation, though Sequence IS returned in the previous examples (see Figure 4)]

Figure 1) Live trade ticks for a Stock requested through Socket image

Figure 2) Snapshot trade ticks for a Stock requested through REST image

Figure 3) Live trade ticks for an Option requested through Socket image

Figure 4) Snapshot trade ticks for an Option requested through REST, data from on or after 2023-01-25 image

Figure 5) Snapshot trade ticks for an Option requested through REST, data from before 2023-01-25 image

kenaschmidt commented 1 year ago

Any update on potential fix for the historical Option trade snapshot issues? The problem of dropping sequence and timestamps seems to be ongoing for any data older than a day or two... makes it essentially unusable in its current form. Thanks.

jrbell19 commented 1 year ago

Hi @kenaschmidt,

We are aware that this is a significant caveat to using the service, so we are actively looking into solutions. As far as an ETA to expect, I cannot confidently say.

As discussions continue around how we plan to address this, I will be sure to keep you informed. Apologies for the ongoing troubles.

kenaschmidt commented 1 year ago

Thanks for your response - As a workaround on my end I am using participant timestamp only and allowing duplicates based on MS accuracy.