Closed 603000 closed 3 months ago
@603000 Hi, there is nothing in this repo that has anything to do with IP addresses
Is it possible you're hitting the SEC's rate limit? You might consider setting a User-Agent
header in the SecClient#get
method here: https://github.com/toddwschneider/sec-13f-filings/blob/f002c5164d81b69aafce620de6ce208255ba1db1/app/lib/sec_client.rb#L193-L197
Here's the SEC documentation: https://www.sec.gov/os/accessing-edgar-data
You could try something like:
response = HTTParty.get(url, headers: {"User-Agent" => "[Your Name] [your email]@[domain].com"})
Hi, thank you for your response. Please, take a look at the screen. How do you think what can be the reason for such blocks?
That is a problem with text encoding, there are some characters in the SEC file that are not UTF-8. I just put in this commit which should change any invalid characters into ?
: https://github.com/toddwschneider/sec-13f-filings/commit/2dad9798e797a91b810a4b135e2f93c15a94cadb
Let me know if you still have issues, thanks
Thank you, I'll try this.
I also encountered another problem (not sure what's the cause) while trying to open any filing from any manager - the error always looks like this:
Could you give a hint what can cause such type of errors?
I would guess that the SEC website is blocking you. Have you tried setting a user agent with your name and email address? I gave an example in my first comment on this thread
Thank you, I tried your solution but it didn't help. Looks like the problem is in something else. Maybe you have other ideas what should I check?
It's hard for me to debug without being able to reproduce the current error. I'd recommend going into the rails console
and seeing exactly what HTML is being fetched for a specific 13F, e.g.
html = SecClient.new.get("https://www.sec.gov/Archives/edgar/data/1067983/000095012322006442").body
And then you'll have to poke around with the html
text to see what's in there. I've seen error messages asking for the user to declare a user agent, but if you've tried that already then I'm not sure what else could be happening, so you'll probably have to do some investigating on your own
Thank you, I'll try to investigate. But it increasingly looks like that the problem is caused by incorrect installation. Am I right that if the installationis done according to instruction - everything should work ok? Or maybe these issues can be caused because of VPS provider?
Yes, I've been able to download all of the various reports, both on my local machine and on an AWS-hosted server
Or maybe these issues can be caused because of VPS provider?
I don't know if the SEC website blocks certain VPS providers, but I suppose it's possible. Again I'd recommend checking the html you get back to see if it has any error messages in it
Thanks for sharing the repo, appreciate it. It worked great when I updated the CIKs and pulled the data last quarter.
Now I seem to hit 'OpenURI::HTTPError: 403 Forbidden' though.
I ran this line in the Terminal, and I do get a proper response.
html = SecClient.new.get("https://www.sec.gov/Archives/edgar/data/1067983/000095012322006442").body
I'll investigate further to see what's going on. I'd really appreciate any guidance on how I approach debugging as I'm still a beginner in Ruby. Thank you.
Hi @balaca, have you tried setting your user agent? Take a look at this comment and give that a shot: https://github.com/toddwschneider/sec-13f-filings/issues/4#issuecomment-1288941445
Hey @toddwschneider, i've experiencing same issue as @balaca. After db setup i started import with seed_minimal_db
but it processed only few quarter before encountering 'OpenURI::HTTPError: 403 Forbidden'. User-agent is set and using curl i can access to data so i assume that i am not blocked by SEC.
Another useful thing is that i can see in the thirteen_fs table that 2024 Q3 is loaded.
@balaca @stefx99 yes I'm getting something similar in development, I wonder if they've changed their rate limits. I'll take a closer look when I can, but it could be something like you need to run one quarter at a time, then wait a few minutes
Thanks for the answer @toddwschneider, actually I found out what was the issue. I will submit PR on the weekend.
You just need to pass user-agent header in download function
Thanks @stefx99! I just made that change here: https://github.com/toddwschneider/sec-13f-filings/pull/11
It works for me in development now, feel free to reopen here if there are still issues
Hi, looks like there is some issue in the code that blocks access for any IP address except for the one that was originally intended. How can it be fixed?