iipc / openwayback

The OpenWayback Development
http://www.netpreserve.org/openwayback
Apache License 2.0
483 stars 274 forks source link

By default disable html parsing during indexing #403

Closed ato closed 5 years ago

ato commented 5 years ago

The HTML parser can go into an infinite loop (#402, #162). Since robotflags are not used by most users let's disable it by default to make indexing more reliable.

Adds a -parse-html option the cdx-indexer CLI tool to re-enable.

ldko commented 5 years ago

Thanks @ato!