Closed tw4l closed 5 years ago
schemas.microsoft.org
w3.org
bulk_extractor stoplists for known/safe URLs, domains, email address, and CCNs added in commit https://github.com/timothyryanwalsh/bulk-reviewer/commit/aaa823a5f5957e688ece26a40d5d8e529c50205d
e.g. ns.adobe.com (PDF), purl.org (Dublin Core), schemas.openxmlformats.org (OOXML)