Closed jeff1evesque closed 5 years ago
We need to clear our RStudio workspace, remove corresponding custom packages, then run our app.R
.
The following segment in our name_to_ticker.R
generates the following error:
> df = load_data_fin654(
+ paste0(cwd, '/data/data-breaches.csv'),
+ paste0(cwd, '/data/Privacy_Rights_Clearinghouse-Data-Breaches-Export.csv'),
+ paste0(cwd, '/python/dataframe.py')
+ )
> tickers = name_to_ticker(
+ df$company,
+ c(
+ paste0(cwd, '/data/amex.csv'),
+ paste0(cwd, '/data/nasdaq.csv'),
+ paste0(cwd, '/data/nyse.csv')
+ ),
+ c(paste0(cwd, '/python/dataframe.py'), paste0(cwd, '/python/name_to_ticker.py'))
+ )
Error in py_call_impl(callable, dots$args, dots$keywords) :
KeyError: "['name' 'symbol'] not in index"
Detailed traceback:
File "<string>", line 14, in name_to_ticker
File "C:\Python36\lib\site-packages\pandas\core\frame.py", line 2682, in __getitem__
return self._getitem_array(key)
File "C:\Python36\lib\site-packages\pandas\core\frame.py", line 2726, in _getitem_array
indexer = self.loc._convert_to_indexer(key, axis=1)
File "C:\Python36\lib\site-packages\pandas\core\indexing.py", line 1327, in _convert_to_indexer
.format(mask=objarr[mask]))
Specifically, the following python segments fails:
def name_to_ticker(series, ref, col_1, col_2):
'''
convert list of company names to list of tickers.
@col_1, column converted to index in dict
@col_2, column converted to value in dict
'''
references = ref[[col_1, col_2]].set_index(col_1).to_dict()
return([x if x not in references else references[x] for x in series])
d240477: the adjusted code returns the company names, not the desired company ticker.
The following logic returns a list of NULL
:
references = ref[[col_1, col_2]].set_index(col_1).to_dict()
return([references[x] if x in references else None for x in series])
This suggests that no elements in the series
exists in the constructed references
. Therefore, more effort needs to verify whether references
is properly constructed.
Temporarily adjusting name_to_ticker.py
:
def name_to_ticker(series, ref, col_1, col_2):
'''
convert list of company names to list of tickers.
@col_1, column converted to index in dict
@col_2, column converted to value in dict
'''
return(ref[[col_1, col_2]])
Returns the desired structure:
This suggests to_dict()
does not convert the above dataframe to a dict, which prevents the if
case to succeed in the list comprehension [references[x] if x in references else None for x in series]
.
Replacing the earlier return
with either of the following implementation:
return(ref[[col_1, col_2]].loc[series])
return(ref[[col_1, col_2]].loc[series][[col_2]])
Generates the following error traceback:
Error in py_call_impl(callable, dots$args, dots$keywords) :
KeyError: 'None of [[\'cathay pacific airways\', \'chinese resume leak\', \'blur\', \'blank media games\', \'wordpress\', \'google+\', \'quora\', \'marriott hotels\', \'nmbs\', \'facebook\', \'panerabread\', \'aadhaar\', \'dixons carphone\', \'myheritage\', \'saks and lord & taylor\', \'careem\', \'texas voter records\', \'british airways\', \'t-mobile\', \'myfitnesspal\', \'health south east\', \'nametests\', \'ticketmaster\', \'firebase\', \'aadhaar\', \'grindr\', \'orbitz\', \'mbm company\', \'localblox\', \'twitter\', \'viewfines\', \'ticketfly\', \'amazon\', \'amazon\', \'urban massage\', \'dell\', \'high tail hall\', \'sky brasil\', \'vision direct\', \'healthcare.gov\', \'cms\', \'facebook\', \'newegg\', \'disqus\', \'rootsweb\', \'yahoo\', \'uber\', \'wonga\', \'snapchat\', \'spambot\', \'cex\', \'al.type\', \'cellebrite\', \'waterly\', \'swedish transport agency\', \'hong kong registration & electoral office\', \'river city media\', \'dafont\', \'bell\', \'zomato\',
f80d986: we were able to ensure name_to_ticker
returns corresponding ticker names:
However, the successive logic to store the matching ticker names back to the original dataframe, generated an error regarding a mismatch of row size:
> df$ticker = tickers
Error in `$<-.data.frame`(`*tmp*`, ticker, value = c(`525` = "aapl", `879` = "celg", :
replacement has 58 rows, data has 2760
Changing dataframe.py
to the following:
def set_column(self, column, ref, new_key):
'''
Append column values exists in the provided reference, append corresponding
values into a new 'column' on the current dataframe.
'''
return(print([x['name'] for i,x in ref.iterrows()]))
vals = [x['symbol'] if x['name'] in self.df[column] else None for i,x in ref.iterrows()]
#self.df[new_key] = vals
produces the following company names:
['apple inc.', 'celgene corporation', 'celgene corporation', 'copart, inc.', 'docusign, inc.', 'facebook, inc.', 'intuit inc.', 'marriott international', 'multi-color corporation', 'nvidia corporation', 'performant financial corporation', 'sabre corporation', 'the madison square garden company', 'aecom', 'american express company', 'broadridge financial solutions, inc.', 'citigroup inc.', 'citigroup inc.', 'citigroup inc.', 'citigroup inc.', 'citigroup inc.', 'citigroup inc.', 'delta air lines, inc.', 'discover financial services', 'dollar general corporation', 'first data corporation', 'first republic bank', 'first republic bank', 'first republic bank', 'first republic bank', 'first republic bank', 'first republic bank', 'genesco inc.', 'global payments inc.', 'kb home', 'kbr, inc.', 'keycorp', 'keycorp', 'keycorp', 'morgan stanley', 'morgan stanley', 'morgan stanley', 'morgan stanley', 'morgan stanley', 'morgan stanley', 'morgan stanley', 'occidental petroleum corporation', 'perkinelmer, inc.', 'qvc, inc.', 'rite aid corporation', 'rollins, inc.', 'stanley black & decker, inc.', 'stanley black & decker, inc.', 'stanley black & decker, inc.', 'suntrust banks, inc.', 'suntrust banks, inc.', 'the madison square garden company', 'weyerhaeuser company']
However, changing dataframe.py
:
def set_column(self, column, ref, new_key):
'''
Append column values exists in the provided reference, append corresponding
values into a new 'column' on the current dataframe.
'''
return(print(self.df[column]))
vals = [x['symbol'] if x['name'] in self.df[column] else None for i,x in ref.iterrows()]
#self.df[new_key] = vals
Produces the following output:
17100 occidental petroleum corporation
12010 kb home
12210 keycorp
393 rite aid corporation
428 citigroup inc.
566 weyerhaeuser company
568 copart, inc.
625 keycorp
718 celgene corporation
810 kbr, inc.
823 broadridge financial solutions, inc.
879 citigroup inc.
1016 rite aid corporation
1099 first data corporation
1113 docusign, inc.
1197 first republic bank
1252 rite aid corporation
1347 stanley black & decker, inc.
1375 rollins, inc.
1386 global payments inc.
1426 apple inc.
1451 genesco inc.
1526 discover financial services
1527 discover financial services
1535 discover financial services
1547 american express company
1559 discover financial services
1588 american express company
1647 aecom
1716 nvidia corporation
1788 sabre corporation
1792 morgan stanley
1912 perkinelmer, inc.
1957 multi-color corporation
2048 qvc, inc.
2050 the madison square garden company
2167 sabre corporation
2184 sabre corporation
2194 sabre corporation
2219 performant financial corporation
2342 dollar general corporation
2359 intuit inc.
2375 delta air lines, inc.
2380 suntrust banks, inc.
2406 facebook, inc.
2454 facebook, inc.
2457 marriott international
Name: company, dtype: object
The following variant in the dataframe.py
suffices:
def set_column(self, column, ref, new_key):
'''
Append column values exists in the provided reference, append corresponding
values into a new 'column' on the current dataframe.
'''
## only 'ref' contains stock symbols
results = []
for i, x in self.df.iterrows():
if x[column] in ref['name'].values:
results.append(ref.loc[ref['name'] == x[column], 'symbol'].iloc[0])
# vals = [ref.loc[ref['name'] == x[column]] for i,x in self.df.iterrows() if x[column] in ref['name'].values]
self.df[new_key] = results
Converting the above to a list comprehension is possible. However, it will be less readable to the former verbose syntax. Therefore, this issue should be sufficient. Next, we'll need to determine whether the study the entire subset, or further subset our reduced dataset. Furthermore, we'll need to obtain the corresponding timeseries stock values. This will propel us to the exploratory and analysis phase.
As stated earlier, we need to determine whether the dataset needs to be further reduced. Additionally, we need to determine how to pull the corresponding timeseries stock data:
We need to determine additional required economic datasets.