internetarchive / umbra

A queue-controlled browser automation tool for improving web crawl quality
Apache License 2.0
60 stars 25 forks source link

Adjust Outlinks Posted to Heritrix #63

Closed BitBaron closed 7 years ago

BitBaron commented 7 years ago
BitBaron commented 7 years ago

Theprune_outlinks(dirty_links, block_list=None) function might make more sense as a utility function but I thought I was following the current style of this class. Also, the current implementation only makes since to the post_outlinks(outlinks=None) function.