shilewenuw / get_all_tickers

Get all publicly traded tickers with this library
MIT License
275 stars 123 forks source link

ParserError: Error tokenizing data. C error: Expected 1 fields in line 23, saw 46 #12

Open MislavSag opened 3 years ago

MislavSag commented 3 years ago

When I try:

from get_all_tickers import get_tickers as gt

tickers = gt.get_tickers()

I get an error:


tickers = gt.get_tickers(NASDAQ=False)
---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
c:\Users\Mislav\Documents\GitHub\stocksee\stocksee\ib_market_data.py in 
----> 36 tickers = gt.get_tickers(NASDAQ=False)

C:\ProgramData\Anaconda3\lib\site-packages\get_all_tickers\get_tickers.py in get_tickers(NYSE, NASDAQ, AMEX)
     71     tickers_list = []
     72     if NYSE:
---> 73         tickers_list.extend(__exchange2list('nyse'))
     74     if NASDAQ:
     75         tickers_list.extend(__exchange2list('nasdaq'))

C:\ProgramData\Anaconda3\lib\site-packages\get_all_tickers\get_tickers.py in __exchange2list(exchange)
    136 
    137 def __exchange2list(exchange):
--> 138     df = __exchange2df(exchange)
    139     # removes weird tickers
    140     df_filtered = df[~df['Symbol'].str.contains("\.|\^")]

C:\ProgramData\Anaconda3\lib\site-packages\get_all_tickers\get_tickers.py in __exchange2df(exchange)
    132     response = requests.get('https://old.nasdaq.com/screening/companies-by-name.aspx', headers=headers, params=params(exchange))
    133     data = io.StringIO(response.text)
--> 134     df = pd.read_csv(data, sep=",")
    135     return df
    136 

~\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    674         )
    675 
--> 676         return _read(filepath_or_buffer, kwds)
    677 
    678     parser_f.__name__ = name

~\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    452 
    453     try:
--> 454         data = parser.read(nrows)
    455     finally:
    456         parser.close()

~\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py in read(self, nrows)
   1131     def read(self, nrows=None):
   1132         nrows = _validate_integer("nrows", nrows)
-> 1133         ret = self._engine.read(nrows)
   1134 
   1135         # May alter columns / col_dict

~\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py in read(self, nrows)
   2035     def read(self, nrows=None):
   2036         try:
-> 2037             data = self._reader.read(nrows)
   2038         except StopIteration:
   2039             if self._first_chunk:

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas\_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 23, saw 46
sarino5 commented 3 years ago

I am getting the same error. Lets hope it gets fixed. @shilewenuw

divvision33 commented 3 years ago

I am seeing this same issue. I really hope it gets fixed soon!

Possums commented 3 years ago

Nasdaq API got updated, so the old URL is no longer available I believe.

The following is a quick implementation of the new API.

import requests
import pandas as pd

headers = {
    'authority': 'api.nasdaq.com',
    'accept': 'application/json, text/plain, */*',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
    'origin': 'https://www.nasdaq.com',
    'sec-fetch-site': 'same-site',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.nasdaq.com/',
    'accept-language': 'en-US,en;q=0.9',
}

params = (
    ('tableonly', 'true'),
    ('limit', '25'),
    ('offset', '0'),
    ('download', 'true'),
)

r = requests.get('https://api.nasdaq.com/api/screener/stocks', headers=headers, params=params)
data = r.json()['data']
df = pd.DataFrame(data['rows'], columns=data['headers'])
jokkemoose commented 3 years ago

Hi, Anyone has a workaround for this please?

ErlerPhilipp commented 3 years ago

I made a quick&dirty fix in get_tickers.py. Filtering by mktcap_min, mktcap_max and sectors works for me. I didn't test regions. Github doesn't allow me to upload a .py file, so you need to remove the '.txt' ending of this one and replace the corresponding file in the package. get_tickers.py.txt Thanks to @Possums for the basics!

krikru commented 3 years ago

I have the same issue.

krikru commented 3 years ago

@ErlerPhilipp Great! The standard way to suggest code changes is to create a pull request, so that might be why you can't upload a .py file (it's slightly more work, though).

ErlerPhilipp commented 3 years ago

@krikru I know but I decided against it, for now, because it's mostly untested changes and really dirty ;) If I have time, I'll create a PR.

patroucheva commented 3 years ago

Hi, any updates? I am still getting this error

patroucheva commented 3 years ago

@ErlerPhilipp getting the following error with the new get_tickers.py:

File "/anaconda3/lib/python3.7/site-packages/get_all_tickers/get_tickers.py", line 92, in get_tickers tickers_list.extend(exchange2list('nyse')) File "/anaconda3/lib/python3.7/site-packages/get_all_tickers/get_tickers.py", line 162, in __exchange2list df_filtered = df[~df['Symbol'].str.contains(".|\^")] File "/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2688, in getitem__ return self._getitem_column(key) File "/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2695, in _getitem_column return self._get_item_cache(key) File "/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache values = self._data.get(item) File "/anaconda3/lib/python3.7/site-packages/pandas/core/internals.py", line 4115, in get loc = self.items.get_loc(item) File "/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'Symbol'

Coombszy commented 3 years ago

@patroucheva, You can fix the issue by changing 'Symbol' to 'symbol' on lines 162 and 163

DimaDDM commented 3 years ago

Fix get_tickers.py.txt and full output tickers.xlsx

monkey-bidness commented 3 years ago

I made a quick&dirty fix in get_tickers.py. Filtering by mktcap_min, mktcap_max and sectors works for me. I didn't test regions. Github doesn't allow me to upload a .py file, so you need to remove the '.txt' ending of this one and replace the corresponding file in the package. get_tickers.py.txt Thanks to @Possums for the basics!

After making the change, what were the next steps for getting the package to update? I've tried making the swap and reinstalling with the new file, but when trying to use the package in script it still renders the same error.

frank-ceballos commented 3 years ago

Nasdaq API got updated, so the old URL is no longer available I believe.

The following is a quick implementation of the new API.

import requests
import pandas as pd

headers = {
    'authority': 'api.nasdaq.com',
    'accept': 'application/json, text/plain, */*',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
    'origin': 'https://www.nasdaq.com',
    'sec-fetch-site': 'same-site',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.nasdaq.com/',
    'accept-language': 'en-US,en;q=0.9',
}

params = (
    ('tableonly', 'true'),
    ('limit', '25'),
    ('offset', '0'),
    ('download', 'true'),
)

r = requests.get('https://api.nasdaq.com/api/screener/stocks', headers=headers, params=params)`
data = r.json()['data']
df = pd.DataFrame(data['rows'], columns=data['headers'])

There is a small typo on the following line:

r = requests.get('https://api.nasdaq.com/api/screener/stocks', headers=headers, params=params)'

Simply remove the extra quotation mark at the end. This works @Possums

s-kerin commented 3 years ago

Updated the code by DimaDDM to be able to get individual exchanges rather than return all symbols get_tickers_by_region also updated but untested

get_tickers.py.txt

tpt5cu commented 3 years ago

@DimaDDM I just copied and pasted that file into get_tickers.py

Could you open a PR for it?

bobdabuilder1 commented 3 years ago

Has anyone managed to get_tickers_by_region working? I can only seem to get it to return US tickers instead of the region specified.

arainboldt commented 3 years ago

hey @here, as it doesn't look like the author cares much about this. I'd suggest that somebody that does, and yet doesn't want to just use the script provided above, create their own repo/fork this with the appropriate modifications.

There's no point in creating PRs if the author is MIA.

JaisinhBhosale9712 commented 3 years ago

I made a quick&dirty fix in get_tickers.py. Filtering by mktcap_min, mktcap_max and sectors works for me. I didn't test regions. Github doesn't allow me to upload a .py file, so you need to remove the '.txt' ending of this one and replace the corresponding file in the package. get_tickers.py.txt Thanks to @Possums for the basics!

Hey, the "Symbol" column name shall be in all lower case "symbol"

justege commented 3 years ago

Copy and paste this code to get_tickers.py in your library by searching in your PC where it is. Cheers.

import pandas as pd
from enum import Enum
import io
import requests

_EXCHANGE_LIST = ['nyse', 'nasdaq', 'amex']

_SECTORS_LIST = set(['Consumer Non-Durables', 'Capital Goods', 'Health Care',
       'Energy', 'Technology', 'Basic Industries', 'Finance',
       'Consumer Services', 'Public Utilities', 'Miscellaneous',
       'Consumer Durables', 'Transportation'])

headers = {
    'authority': 'api.nasdaq.com',
    'accept': 'application/json, text/plain, */*',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
    'origin': 'https://www.nasdaq.com',
    'sec-fetch-site': 'same-site',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.nasdaq.com/',
    'accept-language': 'en-US,en;q=0.9',
}

def params(exchange):
    return (
        ('letter', '0'),
        ('exchange', exchange),
        ('render', 'download'),
    )

params = (
    ('tableonly', 'true'),
    ('limit', '25'),
    ('offset', '0'),
    ('download', 'true'),
)

def params_region(region):
    return (
        ('letter', '0'),
        ('region', region),
        ('render', 'download'),
    )

class Region(Enum):
    AFRICA = 'AFRICA'
    EUROPE = 'EUROPE'
    ASIA = 'ASIA'
    AUSTRALIA_SOUTH_PACIFIC = 'AUSTRALIA+AND+SOUTH+PACIFIC'
    CARIBBEAN = 'CARIBBEAN'
    SOUTH_AMERICA = 'SOUTH+AMERICA'
    MIDDLE_EAST = 'MIDDLE+EAST'
    NORTH_AMERICA = 'NORTH+AMERICA'

class SectorConstants:
    NON_DURABLE_GOODS = 'Consumer Non-Durables'
    CAPITAL_GOODS = 'Capital Goods'
    HEALTH_CARE = 'Health Care'
    ENERGY = 'Energy'
    TECH = 'Technology'
    BASICS = 'Basic Industries'
    FINANCE = 'Finance'
    SERVICES = 'Consumer Services'
    UTILITIES = 'Public Utilities'
    DURABLE_GOODS = 'Consumer Durables'
    TRANSPORT = 'Transportation'

def get_tickers(NYSE=True, NASDAQ=True, AMEX=True):
    tickers_list = []
    if NYSE:
        tickers_list.extend(__exchange2list('nyse'))
    if NASDAQ:
        tickers_list.extend(__exchange2list('nasdaq'))
    if AMEX:
        tickers_list.extend(__exchange2list('amex'))
    return tickers_list

def get_tickers_filtered(mktcap_min=None, mktcap_max=None, sectors=None):
    tickers_list = []
    for exchange in _EXCHANGE_LIST:
        tickers_list.extend(__exchange2list_filtered(exchange, mktcap_min=mktcap_min, mktcap_max=mktcap_max, sectors=sectors))
    return tickers_list

def get_biggest_n_tickers(top_n, sectors=None):
    df = pd.DataFrame()
    for exchange in _EXCHANGE_LIST:
        temp = __exchange2df(exchange)
        df = pd.concat([df, temp])

    df = df.dropna(subset={'marketCap'})
    df = df[~df['Symbol'].str.contains("\.|\^")]

    if sectors is not None:
        if isinstance(sectors, str):
            sectors = [sectors]
        if not _SECTORS_LIST.issuperset(set(sectors)):
            raise ValueError('Some sectors included are invalid')
        sector_filter = df['Sector'].apply(lambda x: x in sectors)
        df = df[sector_filter]

    def cust_filter(mkt_cap):
        if 'M' in mkt_cap:
            return float(mkt_cap[1:-1])
        elif 'B' in mkt_cap:
            return float(mkt_cap[1:-1]) * 1000
        else:
            return float(mkt_cap[1:]) / 1e6
    df['marketCap'] = df['marketCap'].apply(cust_filter)

    df = df.sort_values('marketCap', ascending=False)
    if top_n > len(df):
        raise ValueError('Not enough companies, please specify a smaller top_n')

    return df.iloc[:top_n]['Symbol'].tolist()

def get_tickers_by_region(region):
    if region in Region:
        response = requests.get('https://old.nasdaq.com/screening/companies-by-name.aspx', headers=headers,
                                params=params_region(region))
        data = io.StringIO(response.text)
        df = pd.read_csv(data, sep=",")
        return __exchange2list(df)
    else:
        raise ValueError('Please enter a valid region (use a Region.REGION as the argument, e.g. Region.AFRICA)')

def __exchange2df(exchange):
    r = requests.get('https://api.nasdaq.com/api/screener/stocks', headers=headers, params=params)
    data = r.json()['data']
    df = pd.DataFrame(data['rows'], columns=data['headers'])
    return df

def __exchange2list(exchange):
    df = __exchange2df(exchange)
    df_filtered = df[~df['symbol'].str.contains("\.|\^")]
    return df_filtered['symbol'].tolist()

def __exchange2list_filtered(exchange, mktcap_min=None, mktcap_max=None, sectors=None):
    df = __exchange2df(exchange)
    df = df.dropna(subset={'marketCap'})
    df = df[~df['symbol'].str.contains("\.|\^")]

    if sectors is not None:
        if isinstance(sectors, str):
            sectors = [sectors]
        if not _SECTORS_LIST.issuperset(set(sectors)):
            raise ValueError('Some sectors included are invalid')
        sector_filter = df['sector'].apply(lambda x: x in sectors)
        df = df[sector_filter]

    def cust_filter(mkt_cap):
        if 'M' in mkt_cap:
            return float(mkt_cap[1:-1])
        elif 'B' in mkt_cap:
            return float(mkt_cap[1:-1]) * 1000
        elif mkt_cap == '':
            return 0.0
        else:
            return float(mkt_cap[1:]) / 1e6
    df['marketCap'] = df['marketCap'].apply(cust_filter)
    if mktcap_min is not None:
        df = df[df['marketCap'] > mktcap_min]
    if mktcap_max is not None:
        df = df[df['marketCap'] < mktcap_max]
    return df['symbol'].tolist()

def save_tickers(NYSE=True, NASDAQ=True, AMEX=True, filename='tickers.csv'):
    tickers2save = get_tickers(NYSE, NASDAQ, AMEX)
    df = pd.DataFrame(tickers2save)
    df.to_csv(filename, header=False, index=False)

def save_tickers_by_region(region, filename='tickers_by_region.csv'):
    tickers2save = get_tickers_by_region(region)
    df = pd.DataFrame(tickers2save)
    df.to_csv(filename, header=False, index=False)

if __name__ == '__main__':

    tickers = get_tickers()
    print(tickers[:5])

    tickers = get_tickers(AMEX=False)

    # default filename is tickers.csv, to specify, add argument filename='yourfilename.csv'
    save_tickers()

    # save tickers from NYSE and AMEX only
    save_tickers(NASDAQ=False)

    # get tickers from Asia
    tickers_asia = get_tickers_by_region(Region.ASIA)
    print(tickers_asia[:5])

    # save tickers from Europe
    save_tickers_by_region(Region.EUROPE, filename='EU_tickers.csv')

    # get tickers filtered by market cap (in millions)
    filtered_tickers = get_tickers_filtered(mktcap_min=500, mktcap_max=2000)
    print(filtered_tickers[:5])

    # not setting max will get stocks with $2000 million market cap and up.
    filtered_tickers = get_tickers_filtered(mktcap_min=2000)
    print(filtered_tickers[:5])

    # get tickers filtered by sector
    filtered_by_sector = get_tickers_filtered(mktcap_min=200e3, sectors=SectorConstants.FINANCE)
    print(filtered_by_sector[:5])

    # get tickers of 5 largest companies by market cap (specify sectors=SECTOR)
    top_5 = get_biggest_n_tickers(5)
    print(top_5)
Karlheinzniebuhr commented 2 years ago

Same issue gt.get_tickers_filtered(mktcap_max=1)

Exception has occurred: ParserError
Error tokenizing data. C error: Expected 1 fields in line 6, saw 47
  File "C:\dev\Python\stocks\get_symbols.py", line 9, in <module>
    list_of_tickers = gt.get_tickers_filtered(mktcap_max=1)
paulmcq commented 1 year ago

@justege: looks like lines 97 and 120 should be lower case 'symbol' vs 'Symbol', as found by @JaisinhBhosale9712 in the comment just before yours.