nytimes / Fech

Deprecated. Please see https://github.com/dwillis/Fech for a maintained fork.
http://nytimes.github.io/Fech/
Other
114 stars 30 forks source link

How can I retrieve all of FEC filing unique numeric identifiers? #80

Closed ahmedmohiduet closed 8 years ago

ahmedmohiduet commented 8 years ago

I am trying using 'fech' to automate downloading all FEC filings from fec.gov. So far I found that I can retrieve a specific filing info by unique numeric id's from documentation: filing = Fech::Filing.new(723604) filing.download

So how can I retrieve every ids?

dwillis commented 8 years ago

Fech supports all filings with a filing ID beginning with 11850, so you can start with that number and iterate up to the latest one (1032472 as of yesterday), although you'll need to use a begin/rescue block since a handful of filings aren't valid and can't be retrieved.

ahmedmohiduet commented 8 years ago

Thank you dwillis! Any clue how can I retrieve maximum id? I would like to get all new filing as well in an automated manner

dwillis commented 8 years ago

You can use the fech-search gem to parse today's filings and use the highest ID from that. If you are attempting to download them all, please be respectful of the FEC's bandwidth and have the script sleep for 1-2 seconds between requests.

ahmedmohiduet commented 8 years ago

Noted dwillis! Thanks again :)

ahmedmohiduet commented 8 years ago

Hi dwillis! I just wanted to let you know that I had tried downloading every filings with a sleep of 1.5 seconds [sleep(1.5)] in my ruby script. But I found that this also got my crawler blocked after nearly 20 requests. Do you have any ideas how can I avoid this?

dwillis commented 8 years ago

You can grab date-specific files containing all electronic filings on that date, from here: ftp://ftp.fec.gov/FEC/electronic/

ahmedmohiduet commented 8 years ago

Hi dwillis! Thank you! :) I have finally managed downloading them using tor :) But, I'm however facing some difficulties parsing filings with version 2.02 as my 'filing.summary' in my script failing, but I need at least dates in which those filings were posted Am I missing something!

dwillis commented 8 years ago

Glad it worked out! Fech only supports filing versions 3 and higher, so that's why you have errors parsing filings with versions less than 3.