polygon-io / issues

Quickly track and report problems with polygon.io
29 stars 0 forks source link

Missing tickers in polygon_client.list_tickers when compared to both polygon_client.list_aggs and polygon_client.get_grouped_daily_aggs #272

Open evanvolgas opened 11 months ago

evanvolgas commented 11 months ago

URL I am issuing API calls to

  1. polygon_client.list_tickers
  2. polygon_client.list_aggs
  3. polygon_client.get_grouped_daily_aggs

Result I have a script that crawls polygon_client.get_grouped_daily_aggs for the past 5 years and assembles a Big Query dataset of all the price data for all tickers. I also have a script that fetches all tickers from polygon_client.list_tickers and stores those in Big Query as well.

While merging the resulting two datasets of all historical prices with all tickers together (query below), I noticed something unusual. There are 6,442 distinct tickers for which get_grouped_daily_aggs data are available during the past five years but list_tickers data are not. This is easy to verify on the API side as well via the list_aggs endpoint, eg,

from polygon import RESTClient

PROJECT_ID = "new-life-400922"
SECRET_ID = "polygon"

polygon_client = RESTClient()
next(polygon_client.list_tickers('MYT')) # Fails
next(polygon_client.list_aggs('MYT', 1, "day", "2018-01-01", "2023-12-31", limit=50000)) # succeeds 

Query I used to find the 6,442 missing tickers from

SELECT 
count(distinct b.ticker )
FROM `new-life-400922.etl.stg_all_tickers` as a
full outer join `new-life-400922.etl.stg_all_tickers_historical` as b
on a.ticker = b.ticker 
where a.ticker is null
limit 1000

Expected Result I would expect polygon_client.list_tickers to contain all of the tickers that Polygon is aware of, and for the distinct tickers in polygon_client.get_grouped_daily_aggs and polygon_client.list_aggs to be a subset of those. That does not appear to be the case though. Again, my query showed 6,442 distinct tickers for which price data are available that do not show up in polygon_client.list_tickers. I made API calls with five of them (see below) to make sure that this was on Polygon's side, and it appears to me that it is.

examples_for_testing = ['PBB', 'MYT', 'SRSAU', 'ENBA', 'SYN']

Desktop (please complete the following information):

evanvolgas commented 11 months ago

I did a lot more digging in and here's what I've got it.

Actually it looks like this is mostly an issue of active vs inactive, but not entirely. Instead of 6,442 missing tickers, there are 393 of them (ones that do show up in get_grouped_daily_aggs and list_aggs but don't show up in all_tickers regardless of the active flag)

To begin, I have created a gist of the 393 tickers that do have aggregates data but don't have any data at list_tickers. I don't know why these tickers are missing from list_tickers

Also, I'm seeing some unusual near duplicates in the tickers endpoint that are a little hard to interpret. For example, when I go to here https://polygon.io/docs/stocks/get_v3_reference_tickers and search for MYT, for example, with active false, I get two records that are near duplicates, eg,

for t in polygon_client.list_tickers('MYT', active=False):
    print(t)

I get

Ticker(active=False, cik='0001543268', composite_figi=None, currency_name='usd', currency_symbol=None, base_currency_symbol=None, base_currency_name=None, delisted_utc='2020-08-27T00:00:00Z', last_updated_utc='2020-08-27T00:00:00Z', locale='us', market='stocks', name='Urban Tea, Inc. Ordinary Shares', primary_exchange='XNAS', share_class_figi=None, ticker='MYT', type='CS', source_feed=None)
Ticker(active=False, cik='0001543268', composite_figi=None, currency_name='usd', currency_symbol=None, base_currency_symbol=None, base_currency_name=None, delisted_utc='2021-06-16T00:00:00Z', last_updated_utc='2021-06-16T00:00:00Z', locale='us', market='stocks', name='Urban Tea, Inc. Ordinary Shares', primary_exchange='XNAS', share_class_figi=None, ticker='MYT', type='CS', source_feed=None)

I'm a little confused by what to make of the two delisting dates. There are stranger examples still, such as AAU, which includes both

{
    "results": [
        {
            "ticker": "AAU",
            "name": "ALMADEN MINERALS LTD",
            "market": "stocks",
            "locale": "us",
            "primary_exchange": "XASE",
            "type": "CS",
            "active": false,
            "currency_name": "usd",
            "cik": "0001015647",
            "last_updated_utc": "2015-08-11T00:00:00Z",
            "delisted_utc": "2015-08-11T00:00:00Z"
        }
    ],
    "status": "OK",
    "request_id": "d8a9c196274c10e053473afc46843b29",
    "count": 1
}

and

{
    "results": [
        {
            "ticker": "AAU",
            "name": "Almaden Minerals Ltd.",
            "market": "stocks",
            "locale": "us",
            "primary_exchange": "XASE",
            "type": "CS",
            "active": true,
            "currency_name": "usd",
            "cik": "0001015647",
            "composite_figi": "BBG000DGFSY4",
            "share_class_figi": "BBG001S7VVD4",
            "last_updated_utc": "2023-10-11T00:00:00Z"
        }
    ],
    "status": "OK",
    "request_id": "786ce36dc341adb564b58af92da6c978",
    "count": 1
}

It doesn't seem to me that this stock was ever delisted? https://www.google.com/finance/quote/AAU:NYSEAMERICAN?window=MAX

In summary, most (but not all) of the "missing" tickers I noticed were inactive... which is a misunderstanding on my part (my apologies). But it seems like there shouldn't be 393 of them missing, still, which it appears to me that there are. It also seems odd to me that the all_tickers endpoint returns more than one ticker record for certain delisted tickers. And there are examples like AAU where it's even harder to interpret the data.

I'm happy to pick the record with the greatest max last_updated_utc date and use that one, if that's the solution to getting the latest info with the ticker as a primary key. I want to double check and make sure that's what I should be doing here before assuming that it is. Please advise and thank you in advance.

AHangstefer commented 11 months ago

HI! Thanks for reaching out. I'm investigating why MYT returns two records and looking into the tickers you shared that are not returned by list_ticker

evanvolgas commented 11 months ago

Awesome, thank you so much. If there's anything I can do to help, please let me know.

AHangstefer commented 11 months ago

Hi! I wanted to give you an update. The duplicate records issue should be taken care of! I'm still digging into the tickers that aren't returned by list_ticker

AHangstefer commented 11 months ago

Hello again! I've created a ticket for out backend team. I was able to see Aggregates for the tickers you mentioned and did not find them through the Tickers endpoint either. They will dig into it and hopefully have the Tickers endpoint working as expected soon. Let me know if there's anything else I can do for you in the meantime