issues
search
internetarchive
/
umbra
A queue-controlled browser automation tool for improving web crawl quality
Apache License 2.0
60
stars
25
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
handle Network.responseReceived w/o requestHeaders
#75
nlevitt
closed
4 years ago
1
More stable handling of started browsers
#74
blekinge
opened
5 years ago
1
Releaseable Browser pool allows Umbra to shut down all browsingThread…
#73
blekinge
opened
5 years ago
1
Include URL in BrowserThread name, so it will be part of every log li…
#72
blekinge
opened
5 years ago
1
Log config can be configured
#71
blekinge
opened
5 years ago
1
message.reject() (don't simply skip message.requeue)
#70
galgeek
closed
5 years ago
0
catch brozzler.PageInterstitialShown
#67
galgeek
closed
5 years ago
0
AITFIVE-1528 update: some message.body's are strings
#66
galgeek
closed
6 years ago
1
republish not requeue
#65
galgeek
closed
6 years ago
1
Stop outlink posting.
#64
BitBaron
closed
7 years ago
0
Adjust Outlinks Posted to Heritrix
#63
BitBaron
closed
7 years ago
1
Backport Brozzler Outlinks Feature to Umbra (AITFIVE-1295)
#62
BitBaron
closed
7 years ago
1
Include python dependency versions
#61
archivingisneat
closed
7 years ago
0
browsing and behaviors will live in brozzler project going forward
#60
nlevitt
closed
8 years ago
0
ARI-4826 whitney.org calendar
#59
galgeek
closed
8 years ago
1
ARI-4838 racineco.com document viewers
#58
galgeek
closed
8 years ago
1
shut down immediately on first signal
#57
nlevitt
closed
8 years ago
1
Change user-agent to Chrome instead of Chromium. This is due to twitt…
#56
vonrosen
closed
8 years ago
0
https://github.com/internetarchive/umbra/pull/54 plus more refactoring
#55
nlevitt
closed
8 years ago
0
Add Facebook login behavior and refactor script template system
#54
adam-miller
closed
8 years ago
0
Add behavior to click on links that execute JavaScript to download report csv files for fec.gov/data.
#53
vonrosen
closed
8 years ago
0
Allow clicking on already clicked element to continue in behaviors if…
#52
vonrosen
closed
8 years ago
0
Make Umbra click on 'Load More' button for youtube pages
#51
vonrosen
closed
8 years ago
0
Add --ignore-certificate-errors to browser startup arguments ot prevent ssl warnings.
#50
vonrosen
closed
8 years ago
1
catch and log exception deleting temporary work directory
#49
nlevitt
closed
8 years ago
0
Add custom behavior for Brooklyn Museum
#48
BitBaron
closed
8 years ago
0
flickr.js should have umbraBehaviorFinished function
#47
ldko
opened
8 years ago
2
update detection of modal close button for facebook changes
#46
nlevitt
closed
8 years ago
0
Custom behavior for Brooklyn Museum site.
#45
BitBaron
closed
8 years ago
0
followup to https://github.com/internetarchive/umbra/pull/43
#44
nlevitt
closed
9 years ago
0
Add scrolling and clicking behavior and testing for click end condition
#43
vonrosen
closed
9 years ago
2
Add scrolling and clicking behavior
#42
vonrosen
closed
9 years ago
1
Adds routing_key to Queue creation
#41
ldko
closed
9 years ago
3
Add routing_key parameter to AmqpBrowserController
#40
ldko
closed
9 years ago
0
ARI-3775, ARI-3956 Simple behaviors
#39
nlevitt
closed
9 years ago
0
increase browser start and stop timeouts, since sometimes we strand brow...
#38
nlevitt
closed
9 years ago
0
behavior for instagram
#37
nlevitt
closed
9 years ago
0
Allow scrolling down a timeline in the facebook plugin so as to capture content in third party embedded timelines.
#36
vonrosen
closed
9 years ago
0
properly handle socket.error from amqp conn.drain_events (was previously...
#35
nlevitt
closed
9 years ago
0
Update README.md
#34
dhamaniasad
closed
9 years ago
0
ARI-4016 - Support: embedded videos on marquette.edu
#33
adam-miller
closed
9 years ago
0
ARI-3904 Instagram behavior to scroll past two pages, and click to enla...
#32
adam-miller
closed
9 years ago
1
new utility queue-json, and another change to help with draining from and republishing to amqp
#31
nlevitt
closed
10 years ago
0
new utility queue-json, and another change to help with draining from and republishing to amqp
#30
nlevitt
closed
10 years ago
0
reject (discard) bad messages
#29
nlevitt
closed
10 years ago
0
Ari 3940 - prioritize scrolling all the way to the bottom
#28
nlevitt
closed
10 years ago
0
Allow default behavior to include clicking on sound cloud player buttons embbedded in 3rd party sites.
#27
vonrosen
closed
9 years ago
0
stability!
#26
nlevitt
closed
10 years ago
0
Allow flash requests to be detected. For https://webarchive.jira.com/browse/ARI-3724
#25
vonrosen
closed
10 years ago
0
more improvements, mostly for robustness
#24
nlevitt
closed
10 years ago
0
Next