hmcguinn / sec-scraper

Python repo to scrape Form 3 and Form 4 off the SEC website.
GNU General Public License v3.0
3 stars 0 forks source link

getting an error on line 95 #1

Open alias-noa opened 3 years ago

alias-noa commented 3 years ago

Traceback (most recent call last): File "C:/Users/Noa/PycharmProjects/sec_scraper_master/scraper.py", line 95, in s = s + "/" + newLink NameError: name 'newLink' is not defined

All I did was try to run it on TEVA instead of AAPL

alias-noa commented 3 years ago

What is the proper way to run this over several stocks? I just changed line 44 so maybe that's why I'm getting this error.

alias-noa commented 3 years ago

Actually how do I even run this thing? I thought I was supposed to run scraper.py....but I'm thinking that's not the correct way. There's on main.py so how do I run it?

alias-noa commented 3 years ago

Tried running multi and got a ton of crazy errors...

hmcguinn commented 3 years ago

Hey @alias-noa! This repo hasn't exactly been in production-shape :) I've just worked around the errors and don't have them pushed I think. Would you be able to copy the errors you received? I'll clean up the repo and add another comment in a little bit.

Glad you found the repo useful enough to give it a shot!

hmcguinn commented 3 years ago

A little bit more detailed comment on usage:

The scraper is set up as a shell script-- the file I use to run it is /multiThreading/multi.py. Multi.py reads in a list of CIK files from /multiThreading/cik.csv. If you need something to map between CIKs and tickers you can find that here.

From there, the scraper searches through the filings for a company (viewable here). As of now, it is configured to only grab Form 3 and Form 4s (Initial Statement of Beneficial Ownership of Securities and Statement of Changes in Beneficial Ownership). That code can be found on lines 84-95 of /multiThreading/getList.py.

The code to actually grab info from the filings in XML form is in /multiThreading/runScraper.py. Currently, it's limited in what it grabs but can be configured easily to grab whatever you want from the filing. The scraper stores all the filings associated with a company in a pandas dataframe before writing it out to an excel file.

Hope that helps to shed a little bit more light on what the code does! It's not exactly the most readable thing,,, I'll get around to cleaning it up at some point hopefully.

I also went ahead and made a couple changes to the repo. It should work after a pull now.

Thanks for giving it a try!