A utility to preflight spreadsheets. Handles most types of spreadsheet formats (txt, csv, xls, xslx, ods, etc...) and will preflight multiple sheets if the input is a workbook.
npm install data-preflight -g
data-preflight --version
Use 'data-preflight' if you installed globally or 'node path/to/repo/src/js/cmd/interface.js' if you didn't.
The 'preflight' action produces a preflight report. If the '--output' parameter is left blank, it will create the preflight in the same directory as the input file with "preflight.html" appended. Choose between html or xml with the --format option.
$ data-preflight preflight --path=/Users/dominickpeluso/Desktop/Sample_spreadsheet.csv --output=/Users/dominickpeluso/Desktop/my_preflight.txt --format=xml
$ data-preflight preflight --path=/Users/dominickpeluso/Desktop/AnotherSheet.csv --debug
You may also add the optional flag '--debug' to see debug messags while the preflight runs. This is helpful if preflights are taking longer than expected to complete. The console output with the debug mode will show the time each step took to process:
You may also recursively preflight an entire directory. Doing so will scan the directory and all sub-folders for spreadsheet files and preflight them all. You may also specify a '--debug' flag here.
$ data-preflight preflight-dir --path=/Users/dominickpeluso/Desktop/MySpreadsheets --format=html
$ data-preflight preflight-dir --path=/Users/dominickpeluso/Desktop/MySpreadsheets --debug
Preflights are standalone HTML report documents. They show details about the structure and data within a spreadsheet. The goal of the preflight is to be used by an operator who can easily check assumptions and catch data errors within the spreadsheet.
The spreadsheet overview shows the name of the spreadsheet and some basic statistics. It also has a collapsable panel for each sheet within the workbook.
This section shows the column headers as well as actual data from the first, middle, and last rows of the database. Additionally, it shows the row positions for the first/middle/last, to make data checks easier to reference.
On the right side of this table is a hash image for each column. The hash image is computed from the data within the column (except the header row). In the example below, each column has an identical hash image because the data within this example spreadsheet is identical for each column.
This section shows additional details for each column. It shows the minimum value length, maximum value length, number of unique values, and number of unique characters found. Clicking on the unique values or unique characters will show panels which display the values and characters found.
Column warnings show potential problems in the column. As of now, it checks for completely static column values, blank values, and non-ascii characters.
The preflight will also guess at the types of characters which appeared in each column. Currently, it checks for alphabetic (lowercase), alphabetic (uppercaser), punctuation, non-ascii, and numeric.
See the Examples directory to see an example spreadsheet with a rendered preflight.