also,
in: def do_xpath_search(archive_url, cnx_id, xpath_query, type="html"):
change type to baked-html
in: archive_host set correct url ("https://archive-qa.cnx.org" or archive-staging.cnx.org" or "https://archive.cnx.org"
in: for book in BOOKS:
you can set how many books to verify from the list by setting BOOK[x:y] values
html transforms PRs: #610 #600 #557 #477
When running xpath search code: self_close_xpath_search.py
add these tags (complete list):
xitems = ["//h:em[not(node())]", "//h:strong[not(node())]", "//h:sub[not(node())]", "//h:sup[not(node())]", "//h:iframe[not(node())]", "//h:span[not(node())]", "//h:h3[not(node())]", "//h:section[not(node())]", "//h:figure[not(node())]", "//h:u[not(node())]", "//h:a[not(node())]", "//h:figcaption[not(node())]", ]
also, in:
def do_xpath_search(archive_url, cnx_id, xpath_query, type="html"):
change type to baked-html in: archive_host set correct url ("https://archive-qa.cnx.org" or archive-staging.cnx.org" or "https://archive.cnx.org" in: for book in BOOKS: you can set how many books to verify from the list by setting BOOK[x:y] values