Open joshuaulrich opened 3 years ago
i have a very common use case where i get symbols for multiple periods. eg, daily weekly and monthly. i only go out to the data source for the daily and then re-sample to longer periods on top of that data. I can certainly handle this in higher level code, but it strikes me that may be a common use case and may be worth handling as well.
my current code base creates an env for each period, so I reference OHLC sets by price.cache[[period]][[Symbol]]
one solution that follows a similar pattern is to return a list of lists. where the first level entries would be the SymbolSpec
. you could then pull out the exact data by:
x <- fech_symbols(sym.spec)
x[[sym.spec]][[Symbol]]
a non-recusive call to unlist
would undo a lot of complexity when its not needed. this could even be handled with a simplify
parameter to the fetch call.
just thinking out loud
I just had the thought that we could support symbol mapping by using the name of the element of the symbols
argument as the ticker symbol. For example:
tickers <-
c(sym_yahoo(symbols = c(SPY = "SPY", FOOyahoo = "FOO")),
sym_fred(symbols = c(FOO = "FOO"))
)
There's a "FOO" from yahoo and fred, but we map the yahoo "FOO" to "FOOyahoo".
been stewing on this in the background. its fundamentally a URN problem. i think a the most robust model would be symbol@source/subsource
eg: TM@yahoo/nyse 7203@yahoo/jpx T10Y3M@fred GOOGL@tiingo/nasdaq
this is very similar to your ide, just expans on the idea w/ a fully structured URN. at a very high level, with defaults for source and partial matching on source + subsource, i think it should handle a lot of scenarios
Very interesting idea! So we could create a sym_urn()
function that takes strings like "yahoo:TM@nyse"
to fetch NYSE symbol "TM"
from Yahoo Finance. Then we would know that we need to call sym_yahoo("TM")
to dispatch to the correct import_ohlc()
method.
So your examples would translate to:
sym_urn("yahoo:TM@nyse")
-> sym_yahoo("TM")
sym_urn("yahoo:7203@jpx")
-> sym_yahooj("7203")
sym_urn("fred:T10Y3M")
-> sym_fred("T10Y3M")
sym_urn("tiingo:GOOGL@nasdaq")
-> sym_tiingo("GOOGL")
I used the foo:bar
syntax because that's how URNs are supposed to be structured (namespace-id:namespace-string).
The URN syntax makes it easy to specify a vendor's sub-source (e.g. Tiingo has EOD data and IEX data). I'm not sure how that would work for the current syntax. Maybe sym_tiingo()
and sym_tiingo_iex()
?
i guess i was sort of thinking that the namespace was quantmod
, but also was thinking more conceptually than formal grammar. i'd have to re-read the spec before i had a strong view on any of this
i think a lot of schemes could work. definitely like the idea of a ParseSmbolUrn()
function that would expose the components as a list or data.frame
. that certainly could be built to delegate to source specific implementations as u describe. a lot of flexibility there. id prolly lean to the simplest form that meets all current know use cases.
on the tiingo example i'd lean to something like googl@tiingo.iex
o r tiingo:googl@iex
i think the sym_xxx
functionality would need to be extended to support sub-sources, otherwise they dont add much value and we shouldnt introduce them
Current State: For the most part, Symbol@src is unique. i went through a number of sources and found only 1 exception to this so far (IBrokers). in fact, most of the underlying APIs only accept a symbol and sometimes date ranges. The sources have already mucked around with the symbols to make them unique
getSymbols("IAU:TSE", "google")
vs getSymbols("IAU", "google")
some use cases/scenarios that i think would be useful while considering design decisions:
simplify
type parameter could even be the default to make interactive use less cumbersome.c()
result data from previous calls to diff sources and diff date ranges.
a. should do something like rbind()
symbol within a source.some use cases/scenarios that i think would be useful while considering design decisions:
Moved to #18 to keep this issue focused on handling duplicate symbols.
A symbol spec vector could have the same symbol for multiple sources. And the symbols may be for different underlying series (e.g. "FOO" could be a stock or a FRED series).
What should we do in this case?
My intuition is that it shouldn't be allowed. We could throw an error in
c.symbol_spec()
. The error message should tell the user to remap one of the symbols.Thoughts?