This tool combines various open source tools to give insight into accessibility and performance metrics for a list of URLs. There are several parts that can be understood as such:
Get get started, follow the installation instructions below. Once complete:
start app.py
or python app.py
.NOTE: At the moment, no database is used due to an initial interest in CSV DATA ONLY. The system creates one folder for each as follows (under /REPORTS/your_report_name):
/SPIDER (used to store crawl data)
At this point, a database would make more sense and adding a function to "Export to CSV", etc.
As mentioned, simply provide a CSV with a list of URLs (column header = "Address") and select the tests to run through the web form.
The application is configured through environment variables. On startup, the application
will also read environment variables from a .env
file.
To get all tests running, the following steps are required:
sudo apt update
sudo apt install git
sudo apt-get install python3-pip
sudo apt-get install python3-venv
sudo apt-get update
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get install python3.6
git clone https://github.com/soliagha-oc/perception.git
sudo python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python app.py
Browse to http://127.0.0.1:8888/ (or alternatively to port 5000 if you didn't set 8888 in the .env file)
Install the following CLI tools for your operating system:
Download and install the matching/required chromedriver
Download latest version from official website and upzip it (here for instance, verson 2.29 to ~/Downloads)
wget https://chromedriver.storage.googleapis.com/2.29/chromedriver_linux64.zip
Move to /usr/local/share (or any folder) and make it executable
sudo mv -f ~/Downloads/chromedriver /usr/local/share/
sudo chmod +x /usr/local/share/chromedriver
Create symbolic links
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver
OR
export PATH=$PATH:/path-to-extracted-file/
OR
add to .bashrc
Go to the geckodriver releases page. Find the latest version of the driver for your platform and download it. For example: https://github.com/mozilla/geckodriver/releases
wget https://github.com/mozilla/geckodriver/releases/download/v0.24.0/geckodriver-v0.24.0-linux64.tar.gz
Extract the file with:
tar -xvzf geckodriver*
Make it executable:
chmod +x geckodriver
Add the driver to your PATH so other tools can find it:
export PATH=$PATH:/path-to-extracted-file/
OR
add to .bashrc
Install node
https://nodejs.org/en/download/
curl -sL https://deb.nodesource.com/setup_14.x | sudo -E bash -
sudo apt-get install -y nodejs
Install npm
npm install npm@latest -g
sudo npm install npm@latest -g
Install lighthouse
npm install -g lighthouse
sudo npm install -g lighthouse
https://www.xpdfreader.com/download.html
To install this binary package:
Copy the executables (pdfimages, xpdf, pdftotext, etc.) to to /usr/local/bin.
Copy the man pages (.1 and .5) to /usr/local/man/man1 and /usr/local/man/man5.
Copy the sample-xpdfrc file to /usr/local/etc/xpdfrc. You'll probably want to edit its contents (as distributed, everything is commented out) -- see xpdfrc(5) for details.
See this "Quick Start" guide to enable the Drive API: https://developers.google.com/drive/api/v3/quickstart/python
Complete the steps described in the rest of this page to create a simple Python command-line application that makes requests to the Drive API.
See: https://www.screamingfrog.co.uk/seo-spider/user-guide/general/#commandlineoptions
ScreamingFrog SEO CLI tools provide the following data sets (required listed is bold): - crawl_overview.csv (used to create report DASHBOARD)
Note: There are spider config files located in the /conf folder. You will require a licence to alter the configurations.
Note: If a licence is not available, simply provide a CSV where at least one column has the header "address". See DRUPAL example.
Installed via pip install -r .\requirements.txt
See: https://pypi.org/project/axe-selenium-python/ and https://github.com/dequelabs/axe-core
Lighthouse is an open-source, automated tool for improving the performance, quality, and correctness of your web apps.
When auditing a page, Lighthouse runs a barrage of tests against the page, and then generates a report on how well the page did. From here you can use the failing tests as indicators on what you can do to improve your app.
Quick-start guide on using Lighthouse: https://developers.google.com/web/tools/lighthouse/
View and share reports online: https://googlechrome.github.io/lighthouse/viewer/
Github source and details: https://github.com/GoogleChrome/lighthouse
While there is a /reports/ dashboard, the system is enabled to write to a Google Sheets. To do this, set up credentials for Google API authentication here: https://console.developers.google.com/apis/credentials to get a valid "credentials.json" file.
To facilitate branding and other report metrics, a "non-coder/sheet formula template" is used. Here is a sample template. When a report is run from the /reports/ route, the template is loaded (template report and folder ID found in globals.py and need to be setup/updated once), and the Google Sheet is either created or updated (unique report ID auto generated and found in /REPORTS/your_report_name/logs/_gdrive_logs.txt).
If you have a Screaming Frog SEO Spider licence be sure to add it to your PATH. Even if Screaming Frog SEO Spider is not installed, a CSV can be provided to guide the report tools. Once installed, try to run the sample CSV. To do this:
NOTE: This would exclude PDFs which require a list of exclusively PDF URLs.
Running a sample can be accomplished two ways, using the samples provided in the "/REPORTS/DRUPAL/" folder or by downloading and installing Screaming Frog SEO Spider and running a free crawl (500 URL limit and no configuration/CLI tool access). Once the crawl is completed or file created, create/save the following CSVs:
If another method is used to crawl a base URL, be sure to include the results in a CSV file where at least one header (first row) reads "Address", provide one or more web or PDF URLs, and ensure that the filename(s) is the same as the one listed above and in "/REPORTS/your_report_name/SPIDER/" folder. At least one *_html.csv file is required and to be in the appropriate folder.
It is possible when crawling and scanning sites to encounter various security risks. Please be sure to have a virus scanner enabled to protect against JavaScript and other attacks or disable JavaScript in the configuration.