Open alias-noa opened 3 years ago
What is the proper way to run this over several stocks? I just changed line 44 so maybe that's why I'm getting this error.
Actually how do I even run this thing? I thought I was supposed to run scraper.py....but I'm thinking that's not the correct way. There's on main.py so how do I run it?
Tried running multi and got a ton of crazy errors...
Hey @alias-noa! This repo hasn't exactly been in production-shape :) I've just worked around the errors and don't have them pushed I think. Would you be able to copy the errors you received? I'll clean up the repo and add another comment in a little bit.
Glad you found the repo useful enough to give it a shot!
A little bit more detailed comment on usage:
The scraper is set up as a shell script-- the file I use to run it is /multiThreading/multi.py. Multi.py reads in a list of CIK files from /multiThreading/cik.csv. If you need something to map between CIKs and tickers you can find that here.
From there, the scraper searches through the filings for a company (viewable here). As of now, it is configured to only grab Form 3 and Form 4s (Initial Statement of Beneficial Ownership of Securities and Statement of Changes in Beneficial Ownership). That code can be found on lines 84-95 of /multiThreading/getList.py.
The code to actually grab info from the filings in XML form is in /multiThreading/runScraper.py. Currently, it's limited in what it grabs but can be configured easily to grab whatever you want from the filing. The scraper stores all the filings associated with a company in a pandas dataframe before writing it out to an excel file.
Hope that helps to shed a little bit more light on what the code does! It's not exactly the most readable thing,,, I'll get around to cleaning it up at some point hopefully.
I also went ahead and made a couple changes to the repo. It should work after a pull now.
Thanks for giving it a try!
Traceback (most recent call last): File "C:/Users/Noa/PycharmProjects/sec_scraper_master/scraper.py", line 95, in
s = s + "/" + newLink
NameError: name 'newLink' is not defined
All I did was try to run it on TEVA instead of AAPL