Open vsoler opened 2 years ago
I'm experiencing this as well. I'll take a look into it
I have the same problem.
I spent the day looking into this. The problem is actually in the XBRL
library that finreportr uses to do the heavy lifting of pulling and parsing XBRL data from the SEC. In particular, the bit of XBRL code that downloads supporting schemas fails to detect that an https
url is in fact a url rather than a local file path. It works fine with an http
url. Because it thinks the https
url is a file path, it appends it to the dirname of the cache directory (which is also an https
url). That’s why it is attempting to get a double url string in the code snippet below
Además: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1045810/000104581019000023/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd': HTTP status was '404 Not Found'_
I went and found the XBRL
source code on the cran GitHub mirror, forked it, and implementation a fix. If you install my version of XBRL using devtools::install_github(“riazarbi/XBRL”)
, and restart your session, this error should go away.
Obviously it would be better if the maintainer of XBRL
implemented the fix in the CRAN version. I’ve emailed him to ask how I should submit the fix to him, as the package original source code is not available anywhere that I can find. In the meantime the above patched version of XBRL should work.
Incidentally, I suspect that this issue does not occur with earlier reports because at some point companies started migrating from http
to https
endpoints for their schema definitions.
Here’s a reproducible example of how to use my patched XBRL
package.
devtools::install_github("riazarbi/XBRL")
library(XBRL)
library(finreportr)
options(stringsAsFactors = FALSE)
options(HTTPUserAgent = "REDACTED USERNAME@REDACTED.COM")
GetIncome("NVDA", 2019)
Returns
> GetIncome("NVDA", 2019) |> dplyr::glimpse()
Rows: 51
Columns: 5
$ Metric <chr> "Cost of Goods and Services Sold", "Cost of Goods and Services Sold", …
$ Units <chr> "usd", "usd", "usd", "usd", "usd", "usd", "usd", "usd", "usd", "usd", …
$ Amount <chr> "2847000000", "787000000", "928000000", "1067000000", "1110000000", "3…
$ startDate <chr> "2016-02-01", "2017-01-30", "2017-05-01", "2017-07-31", "2017-10-30", …
$ endDate <chr> "2017-01-29", "2017-04-30", "2017-07-30", "2017-10-29", "2018-01-28", …
Warning message:
In roleId == role.id :
longer object length is not a multiple of shorter object length
Bear in mind that the originalXBRL
package uses download.file
under the hood, which has a timeout of 60 seconds, so if your internet connection is slow and you get a timeout error you might need to manually download some of these schema files and drop them into the cache.
In recent times, I tried the R package finreportr to retrieve Apple's balance sheet. I struggle to find a proper solution as the package may not seem to be robust and accurate. Would there be any material update on this package for the foreseeable future?
I believe this work has tremendous value. Many end users can benefit from the proper fundamental analysis of balance sheets, income statements, and cash flow statements for individual stocks. The vast majority of current issues (XBRL, XML location on SEC EDGAR, and some bugs in the R package finreportr) seem to be highly technical for the average retail end user.
After successfully reading SEC financial data until 2018 inclusive, my attemps to read 2019 & 2020 fail.
I get the following message:
_> GetIncome("NVDA", 2019) Error in fileFromCache(file) : Error in download.file(file, cached.file, quiet = !verbose) : no fue posible abrir la URL -> it was not possible to open URL 'https://www.sec.gov/Archives/edgar/data/1045810/000104581019000023/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd'
Además: Warning message: In download.file(file, cached.file, quiet = !verbose) : cannot open URL 'https://www.sec.gov/Archives/edgar/data/1045810/000104581019000023/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd': HTTP status was '404 Not Found'_
Reports for 2019 & 2020 (and 2021) are available
However, I think there might be a problem with a date, since finreportr is trying to find a file whose date "2018-01-31.xsd" it's unavaible to find.
Is it possible that there might be a problem with the dates?
And congratulations for the package, it can be very useful.
VS