toddwschneider / sec-13f-filings

A nicer way to view SEC 13F filings data
https://13f.info
MIT License
248 stars 52 forks source link

Blocked IP #4

Closed 603000 closed 3 months ago

603000 commented 2 years ago

Hi, looks like there is some issue in the code that blocks access for any IP address except for the one that was originally intended. How can it be fixed?

toddwschneider commented 2 years ago

@603000 Hi, there is nothing in this repo that has anything to do with IP addresses

Is it possible you're hitting the SEC's rate limit? You might consider setting a User-Agent header in the SecClient#get method here: https://github.com/toddwschneider/sec-13f-filings/blob/f002c5164d81b69aafce620de6ce208255ba1db1/app/lib/sec_client.rb#L193-L197

Here's the SEC documentation: https://www.sec.gov/os/accessing-edgar-data

You could try something like:

response = HTTParty.get(url, headers: {"User-Agent" => "[Your Name] [your email]@[domain].com"})
603000 commented 2 years ago

Hi, thank you for your response. Please, take a look at the screen. How do you think what can be the reason for such blocks?

image

toddwschneider commented 2 years ago

That is a problem with text encoding, there are some characters in the SEC file that are not UTF-8. I just put in this commit which should change any invalid characters into ?: https://github.com/toddwschneider/sec-13f-filings/commit/2dad9798e797a91b810a4b135e2f93c15a94cadb

Let me know if you still have issues, thanks

603000 commented 2 years ago

Thank you, I'll try this.

I also encountered another problem (not sure what's the cause) while trying to open any filing from any manager - the error always looks like this:

Screenshot_2

Could you give a hint what can cause such type of errors?

toddwschneider commented 2 years ago

I would guess that the SEC website is blocking you. Have you tried setting a user agent with your name and email address? I gave an example in my first comment on this thread

603000 commented 2 years ago

Thank you, I tried your solution but it didn't help. Looks like the problem is in something else. Maybe you have other ideas what should I check?

toddwschneider commented 2 years ago

It's hard for me to debug without being able to reproduce the current error. I'd recommend going into the rails console and seeing exactly what HTML is being fetched for a specific 13F, e.g.

html = SecClient.new.get("https://www.sec.gov/Archives/edgar/data/1067983/000095012322006442").body

And then you'll have to poke around with the html text to see what's in there. I've seen error messages asking for the user to declare a user agent, but if you've tried that already then I'm not sure what else could be happening, so you'll probably have to do some investigating on your own

603000 commented 2 years ago

Thank you, I'll try to investigate. But it increasingly looks like that the problem is caused by incorrect installation. Am I right that if the installationis done according to instruction - everything should work ok? Or maybe these issues can be caused because of VPS provider?

toddwschneider commented 2 years ago

Yes, I've been able to download all of the various reports, both on my local machine and on an AWS-hosted server

Or maybe these issues can be caused because of VPS provider?

I don't know if the SEC website blocks certain VPS providers, but I suppose it's possible. Again I'd recommend checking the html you get back to see if it has any error messages in it

balaca commented 3 months ago

Thanks for sharing the repo, appreciate it. It worked great when I updated the CIKs and pulled the data last quarter.

Now I seem to hit 'OpenURI::HTTPError: 403 Forbidden' though.

I ran this line in the Terminal, and I do get a proper response.

html = SecClient.new.get("https://www.sec.gov/Archives/edgar/data/1067983/000095012322006442").body

I'll investigate further to see what's going on. I'd really appreciate any guidance on how I approach debugging as I'm still a beginner in Ruby. Thank you.

toddwschneider commented 3 months ago

Hi @balaca, have you tried setting your user agent? Take a look at this comment and give that a shot: https://github.com/toddwschneider/sec-13f-filings/issues/4#issuecomment-1288941445

stefx99 commented 3 months ago

Hey @toddwschneider, i've experiencing same issue as @balaca. After db setup i started import with seed_minimal_db but it processed only few quarter before encountering 'OpenURI::HTTPError: 403 Forbidden'. User-agent is set and using curl i can access to data so i assume that i am not blocked by SEC.

Another useful thing is that i can see in the thirteen_fs table that 2024 Q3 is loaded.

toddwschneider commented 3 months ago

@balaca @stefx99 yes I'm getting something similar in development, I wonder if they've changed their rate limits. I'll take a closer look when I can, but it could be something like you need to run one quarter at a time, then wait a few minutes

stefx99 commented 3 months ago

Thanks for the answer @toddwschneider, actually I found out what was the issue. I will submit PR on the weekend.

https://github.com/toddwschneider/sec-13f-filings/blob/59d31b1e1ed7fdddc86b3f35d0122334b84318de/app/lib/sec_client.rb#L21C1-L22C1

You just need to pass user-agent header in download function

toddwschneider commented 3 months ago

Thanks @stefx99! I just made that change here: https://github.com/toddwschneider/sec-13f-filings/pull/11

It works for me in development now, feel free to reopen here if there are still issues