pudo / sedar

Scraping bits of SEDAR
http://www.sedar.com/homepage_en.htm
28 stars 16 forks source link

How exactly is this used #2

Open ccurelea opened 4 years ago

ccurelea commented 4 years ago

Are the Params the input to the search page?

Then does the script just download all of the pdfs that appear?

ccurelea commented 4 years ago

What is significance of INDUSTRIES = '046,047,005,006,058,025' cant this just be blank?

Thanks

pudo commented 4 years ago

Hey! So to answer both questions: yes, the PARAMS are what is needed to fake the main search form. My use case in developing this was to download only filings from companies in the extractive industries sector, which is what the industry codes listed in INDUSTRIES are. You could try and leave this empty, or make it explicitly iterate over every industry code in the site dropdown if you want to download all documents.

That will take half an eternity, however.

ccurelea commented 4 years ago

Thanks for the answer! Really cool script here very useful!

When load_filings() gets called the params get inputted and it performs the search.

I used a print(res.url) to verify.

I cant seem to pull table rows from the result page. I used another print statement and it returns an empty array.

Screen Shot 2020-05-10 at 6 51 10 PM

Do you have any suggestions?

Thanks again!

ccurelea commented 4 years ago

print(res.url) = https://www.sedar.com/FindCompanyDocuments.do?lang=EN&page_no=1&company_search=CENTR+Brands+Corp.&document_selection=0&industry_group=A&FromDate=11&FromMonth=03&FromYear=2020&ToDate=10&ToMonth=05&ToYear=2020&Variable=Issuer

and

print(test) = []