Open distraughteagle opened 4 months ago
I'm also struggling to make any progress on this. For the DTIC scraping, I relied heavily on html data_bt
fields in the elements, which seem to be entirely missing here.
As a bigger-picture question, am I correct that this portal only gives us aggregate funding and per-project level descriptions?
I made a little bit of progress on the Award Explorer by going to the embedded webpage that has the charts and toolbars here. Sorry it's not really documented or explained, I'll improve it in the future.
Got a working script here: b51c021. Not the most elegant but it works! Would be nice to include "NSF Discipline" in the aggregate sponsor sheet, as this is available on the same page, I just skipped it for now.
Goal is to scrape this site for sponsor name, project type, dollar amount, fiscal year, and campus. This requires us to select year and campus from the page.
The problem is that the buttons for making these selections are rendered from JavaScript code, and thus the elements are not directly inspectable for identification as html elements are. Following the attached tutorial, we could find the js script links and render them as html using selenium.
But, I don't know how to locate the desired elements within the mess of html I get as a result. How to scrape JavaScript webpages using Selenium in Python by Lynn G. Kwong Medium.pdf