portfolioplus / pytickersymbols

Fundamental stock data and yahoo/google ticker symbols for several indices.
MIT License
104 stars 25 forks source link

Error in nasdaq100 stock tickers #45

Closed YairMZ closed 3 years ago

YairMZ commented 3 years ago

Hi,

I tried to use the package and found out that calling the get_stocks_by_index with NASDAQ 100 as an argument returns faulty results. First of all, it returns only 89 stocks when the index contains 103 companies. Moreover some of the stocks aren't included in the index currently.

Basically, the json file used to pull data from is outdated. Is there a way to update the json file?

YairMZ commented 3 years ago

The same issue applies also to the S&P500

SlashGordon commented 3 years ago

Hello my friend,

All data is written to the file stocks.yaml. The build process converts the yaml file into a json file. Please report the missing and wrong stocks as comment or much better as pull request.

I would be glad to hear from you.

May the probability be with you!

YairMZ commented 3 years ago

Hi, thanks for the response. I don't know how make pull requests, so I'll just put it here. By the way, I did the comparison by parsing the response from the relevant wikipedia page.

surplus nasdaq100 tickers: ['VOD', 'AKAM', 'AAL', 'BBBY', 'CELG', 'CTRP', 'XRAY', 'DISCA', 'DISH', 'HSIC', 'MAT', 'MYL', 'NTAP', 'STX', 'SRCL', 'SYMC', 'TSCO', 'TRIP', 'VIAB']

missing nasdaq100 tickers: ['AMD', 'ALGN', 'GOOG', 'ANSS', 'ASML', 'BKNG', 'AVGO', 'CDNS', 'CDW', 'CTAS', 'CPRT', 'DXCM', 'DOCU', 'EXC', 'FOXA', 'FOX', 'IDXX', 'KLAC', 'LBTYK', 'LULU', 'MELI', 'MRNA', 'PEP', 'PDD', 'SGEN', 'SPLK', 'SNPS', 'TTWO', 'TCOM', 'VRSN', 'WDAY', 'XEL', 'ZM']

surplus S&P500 tickers: ['UTX', 'CELG', 'MAT', 'SYMC', 'TRIP', 'VIAB', 'AGN', 'BRK-B', 'RTN', 'AMG', 'ADS', 'APC', 'ARNC', 'BHGE', 'BBT', 'BF-B', 'CPRI', 'CBS', 'CTL', 'XEC', 'FLR', 'FL', 'HOG', 'HRS', 'HCP', 'HP', 'JEC', 'JEF', 'MAC', 'M', 'NKTR', 'JWN', 'RHT', 'STI', 'TMK', 'TSS', 'WCG']

missing S&P500 tickers: ['LNT', 'GOOG', 'AMCR', 'BKR', 'BRK.B', 'BIO', 'BF.B', 'CCL', 'CARR', 'CBOE', 'CDW', 'DXCM', 'DISCK', 'DPZ', 'DOW', 'DD', 'FOX', 'GD', 'GL', 'PEAK', 'HWM', 'IEX', 'J', 'LHX', 'LVS', 'LDOS', 'LYV', 'LMT', 'LUMN', 'MKTX', 'NWS', 'NOC', 'NLOK', 'NVR', 'ODFL', 'OTIS', 'PAYC', 'PEP', 'PFG', 'RTX', 'REG', 'NOW', 'STE', 'TMUS', 'TDY', 'TXT', 'TT', 'TFC', 'TYL', 'UA', 'VIAC', 'WRB', 'WST', 'XEL', 'ZBRA']

YairMZ commented 3 years ago

Following your response, is there a way to update the data after building? How do you construct the yaml file? With the amount of indices you cover, a change in the database is expected at least once a month.

SlashGordon commented 3 years ago

Thank you so much for the wrong and missing stock list. The YAML file is static and is adjusted manually. I will update nasdaq100 and S&P500 in the next weeks.

YairMZ commented 3 years ago

Thanks again. Actually getting these was quite simple, as it involved simple phrasing of the http get response from Wikipedia. Actually I had my Wikipedia based code before I found your library, but the issue with it was every change to the layout of the wikipedia page broke my code.

I wonder how do you obtain the lists when building the yaml file? As I suggested before I strongly suggest adding the functionality of updating the file after build and perhaps even automatically once in a while.

I can only guess other indices suffer such mistakes too, as these are the only two I tested. As I wrote, maintaining it this way is not scalable, and not feasible for this amount of indices.

If you share your ideas maybe I can try to help ( I admit I’m a novice).

SlashGordon commented 3 years ago

You are right. The actual approach is not maintainable. But I like your idea to use Wikipedia as a source. I will start to write a Wikipedia scanner for all indices of pytickersymbols. When the scanner detected any changes between stocks.yaml and Wikipedia an automated pull request will be created. But I need a little bit of time for this change.

SlashGordon commented 3 years ago

And any help is welcome. If you want to help, you could start with a feature branch for the Wiki Scanner. I would try something like that:


import pandas as pd
import wikipedia as wp
from pytickersymbols import PyTickerSymbols

stock_data = PyTickerSymbols()
stocks = stock_data.get_stocks_by_index('NASDAQ 100')

html = wp.page("NASDAQ-100#Components").html().encode("UTF-8")
# I would use pandas becauese read_html is very robuste
df = pd.read_html(html)[0]

# find diff between df and stocks
# write diff to yaml 
# create PR wit new yaml file 
YairMZ commented 3 years ago

My main challenge is that mostly I work on code either alone, or on repos in which I'm an admin of the repo. I never did a request :).

i cloned your repo, and added. a file from my current code to do the comparison. How can i make the request? Do I need to fork the repo first? Can I just commit stuff to this repo?

SlashGordon commented 3 years ago

You have to fork the repo. Check the docs https://docs.github.com/en/free-pro-team@latest/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork :)

SlashGordon commented 3 years ago

I send you an invitation. You should be able to commit your work directly via branch.

YairMZ commented 3 years ago

Thanks. For some reason I lack permissions to push to the original repo even though I accepted your invitation. Thus I forked the repo and made a pull request. I see I can merge it, but I didn't want to do this before you had a chance to have your say on it. It currently shows two warnings, before merging. Also I see there is more than one way to merge, so I didn't know which ine was the right one. Most of of my work is on repos where I and another 5 people are than only ones with access, so I'm unfamiliar with all of the aspects of git other than simple commits, push and pulls.

You're welcome to review this, and merge it if you see fit. Else I can fix what is needed

SlashGordon commented 3 years ago

Hi YairMZ, we now have an automatic scan for index updates. Please check the latest release of pytickersymbols. I think the components are now correct :)