joshuaulrich / TTR

Technical analysis and other functions to construct technical trading rules with R
GNU General Public License v2.0
324 stars 102 forks source link

stockSymbols() not working on AWS EC2 but working on laptop #98

Closed leontegral closed 3 years ago

leontegral commented 3 years ago

Description

Running TTR::stockSymbols() in R 3.6 console on AWS EC2 (Linux) started failing this week. Running on my laptop (OSX) in R 3.6 works. I am running the development version from Github including the latest commit with the change to curl_download().

Note that TTR::getYahooData(), though deprecated, works in both environments.

I am guessing this has to do with the Nasdaq's website's restrictions on agent/browser/IP address? It would be great if there were an alternative source, preferably an API, so that we didn't have to rely on scraping.

Expected behavior

TTR::stockSymbols() runs without error.

Minimal, reproducible example

library(TTR)
symbols <- stockSymbols()

# On laptop
> library(TTR)
> symbols <- stockSymbols()
Fetching AMEX symbols...
Fetching NASDAQ symbols...
Fetching NYSE symbols...
> 

# On AWS
> library(TTR)
> symbols <- stockSymbols()
Fetching AMEX symbols...

(stalls for minutes)

# On laptop
> url <- "https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=AMEX&render=download"
> tmp <- tempfile()
> resp <- curl::curl_download(url, tmp, quiet=FALSE)
[100%] Downloaded 38619 bytes...
>

# On AWS
> url <- "https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=AMEX&render=download"
> tmp <- tempfile()
> resp <- curl::curl_download(url, tmp, quiet=FALSE)

(nothing for minutes)

Session Info

laptop:

R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.3

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TTR_0.23-6

loaded via a namespace (and not attached):
[1] zoo_1.8-8       compiler_3.6.3  xts_0.12-0      curl_4.3       
[5] grid_3.6.3      lattice_0.20-38

AWS EC2:

R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Amazon Linux AMI 2018.03

Matrix products: default
BLAS/LAPACK: /data/miniconda3-lf/lib/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TTR_0.23-6

loaded via a namespace (and not attached):
[1] zoo_1.8-8       compiler_3.6.1  xts_0.12-0      curl_4.3       
[5] grid_3.6.1      lattice_0.20-41
joshuaulrich commented 3 years ago

Thanks for the report! That's frustrating that curl::curl_download() didn't fix it on AWS, like it did for #97.

Am I correct that the second block of code that calls curl::curl_download() was done on AWS, even though the comment says # on laptop?

Is the AMI you're using something that others can get a copy of and try to replicate?

leontegral commented 3 years ago

Thanks for the quick reply! Yes -- that was on AWS. Just corrected the comment. And the AMI is a standard AWS AMI (https://aws.amazon.com/amazon-linux-ami/2018.03-release-notes/)

More generally, I wonder if it might be better to use an alternative source given that the NASDAQ site seems to have some anti-scraping measures. A vanilla curl on my laptop gets a 403 access denied. A possible alternative is ftp://ftp.nasdaqtrader.com/symboldirectory/ (which was suggested in #5 )

ethanbsmith commented 3 years ago

2 thoughts:

  1. maybe move this functionality to something an extensible getListings(src, ...) that dispatches to getListing.src(...), following the getSymbols pattern. This would open up a lot of flexibility in what get returned, and allow people to choose the source that best suits their needs
  2. is this this type of functionality better suited to live in quantmod instead of TTR

i'll submit a contribution:

GetSymbolList.tiingozip <- function(...) {
    tmp.file <- tempfile()
    on.exit(unlink(tmp.file))
    download.file("https://apimedia.tiingo.com/docs/tiingo/daily/supported_tickers.zip", tmp.file, quiet = T)
    return(read.csv(unz(tmp.file, "supported_tickers.csv")))
}
rezar362 commented 3 years ago

I am having similar issue, and only need to get the list of stocks and not any real ticker quote. Is there any workaround?

nknauer commented 3 years ago

Same here, error I'm getting is in RStudio on my local laptop:

symbols <- TTR::stockSymbols() Fetching AMEX symbols... Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : InternetOpenUrl failed: 'The operation timed out'

joshuaulrich commented 3 years ago

Hi everyone, sorry I didn't get to this over the weekend. I was very ill. I'm feeling better, but have limited time to work on OSS during the week. I'm going to get something working this morning, but it will not be polished. I will need to update documentation and add some tests before I release to CRAN.

leontegral commented 3 years ago

Josh, so sorry to hear you were sick. Hope you feel better soon.

joshuaulrich commented 3 years ago

Pushed a first draft this morning. A few things that still need to happen before I merge and release.

  1. Feedback appreciated: convert all symbols to a common format. Various formats are listed here. The previous function returned tickers that should work with Yahoo Finance. It may make sense to add an argument to control which ticker format is returned. That may need to wait for a separate PR after release, depending on how hard it is to implement.
  2. Notes to myself:
    • Remove duplicate reference to Yahoo ticker format.
    • Remove trailing comment characters I used to make sure I included all the columns from nasdaq and other files.
    • Replace market category code with description.