XPaths can be difficult to write. I have created a browser extension that allows XPaths to be tested without having to run the web scraper. Enter an XPath and the extension shows you which elements are selected and the text that would be scraped. You can also test regular expressions and format strings. The output updates instantly when typing as well as when switching tabs so that you can easily test several pages at once.
Installation
Chrome (this only needs to be done once)
Go to chrome://extensions/
Click "Load unpacked" → Select devtools
Click the ⟳ button when changes are made
Firefox (this needs to be done after each restart)
Enable "Access your data for all websites" (if this is not done then the extension button needs to be clicked to give access)
Usage
Opening the Side Panel
The side panel can be accessed via the extension's button in the toolbar. This button can be pinned for easy access. You can also open it via the right-click menu on any page.
Utilizing Scraping Functions
agtern_devtools.js and agtern_devtools.css are injected into every webpage while the extension is enabled. The functions in agtern_devtools.js such as scrape(xpath), scrape_links(xpath), and scrape_elements(xpath) can be used in the main devtools console.
Troubleshooting
When the extension is first enabled, the proper scripts may not be injected properly until the page is refreshed. When in doubt, refresh the page and reopen the panel.
If the extension doesn't work on a single website (or breaks a single website), one of the function names in agtern_devtools.js may be clashing with a function name used by the website. An error will be logged if this occurs. Unfortunately, the only way to remedy this is to either rename the function or disable the extension.
Debugging
The main devtools console will contain errors from agtern_devtools.js. The errors from service_worker.js can be found by inspecting the extension on chrome://extensions/ or about:debugging. The errors from side_panel.js can be found by right-clicking on the side panel and inspecting it. Some manifest errors are expected due to the manifest format being slightly different between browsers.
Purpose
XPaths can be difficult to write. I have created a browser extension that allows XPaths to be tested without having to run the web scraper. Enter an XPath and the extension shows you which elements are selected and the text that would be scraped. You can also test regular expressions and format strings. The output updates instantly when typing as well as when switching tabs so that you can easily test several pages at once.
Installation
Chrome (this only needs to be done once)
chrome://extensions/
devtools
⟳
button when changes are madeFirefox (this needs to be done after each restart)
manifest.json
:about:debugging
manifest.json
about:addons
Usage
Opening the Side Panel
The side panel can be accessed via the extension's button in the toolbar. This button can be pinned for easy access. You can also open it via the right-click menu on any page.
Utilizing Scraping Functions
agtern_devtools.js
andagtern_devtools.css
are injected into every webpage while the extension is enabled. The functions inagtern_devtools.js
such asscrape(xpath)
,scrape_links(xpath)
, andscrape_elements(xpath)
can be used in the main devtools console.Troubleshooting
When the extension is first enabled, the proper scripts may not be injected properly until the page is refreshed. When in doubt, refresh the page and reopen the panel.
If the extension doesn't work on a single website (or breaks a single website), one of the function names in
agtern_devtools.js
may be clashing with a function name used by the website. An error will be logged if this occurs. Unfortunately, the only way to remedy this is to either rename the function or disable the extension.Debugging
The main devtools console will contain errors from
agtern_devtools.js
. The errors fromservice_worker.js
can be found by inspecting the extension onchrome://extensions/
orabout:debugging
. The errors fromside_panel.js
can be found by right-clicking on the side panel and inspecting it. Some manifest errors are expected due to the manifest format being slightly different between browsers.Demo
https://youtu.be/_af0a4QpDvY