internetarchive / umbra

A queue-controlled browser automation tool for improving web crawl quality
Apache License 2.0
60 stars 25 forks source link

Add scrolling and clicking behavior and testing for click end condition #43

Closed vonrosen closed 9 years ago

vonrosen commented 9 years ago

This pull requests takes the changes from a previous pull request (https://github.com/internetarchive/umbra/pull/42/) and adds scrolling behavior. Also, it deals with the case of a site that reuses the same element to load more content, thus never having a truly unique key (either outerHtml or property of umbraClicked set on the element when it is clicked). The only solution I could think of is to check for an "end condition" and stop clicking if the end condtion is true. In the case of http://psu24.psu.edu/, the selector to load more content is a[id='load-more']. However the outerHtml is always the same for this content. The element should continue to be clicked until it is disabled (has class="disabled") and its visibility is hidden. That is what the 2 new properties in behavior.yaml (click_css_selector_end_condition and click_css_selector_computed_style_end_condition) are for. The end condition is evaluated by creating a dynamic function with the end condtion comparison and executing it for each interval. Not sure if this is the best way to do it but it is only way I could think of.

nlevitt commented 9 years ago

There were some issues with psu.js, in particular the umbraClicked thing was causing the button to only be clicked once. I fixed that and made some other tweaks, and also made some small tweaks to simpleclicks.js.in. https://github.com/internetarchive/umbra/pull/44

nlevitt commented 9 years ago

Obsoleted by #44