Piql Insight is a fast and flexible archival package / information package (IP) inspection and dissemination tool.
The application does not validate the IPs apart from validating the existence of attachments. The application is data-driven and a design goal is that is should not contain any references to file formats, XML-tags or keywords. Instead are the views controlled by configuration files and optional regular expressions to transform the XML to user interface friendly keywords. This means that in principle the application can support any XML based IP based format by adding new config files.
For efficient dissemination the application supports batch mode exports with predefined transformations. It also supports creating searchable-PDFs where the PDF contains digitized page and embedded OCR metadata.
Insight ships with these formats:
Piql Insight was originally developed for the Kommunenes Digitale Ressursjsentral (KDRS) in Norway and released under the name KDRS Innsyn. Its job was to support dissemination of archival packages in the NORAK-5 format, used by the Norwegian legislation. It has later been extended to support multiple IP formats.
Released applications are tested on Windows 10 64bit.
The text indexer tool Sphinx must be in the path.
Install:
osx$ brew install sphinx
centos-redhat$ sudo yum install sphinx
Check that tool is available:
computer$ indexer
Sphinx 2.2.11-id64-release (95ae9a6)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
For efficient full text searches in the attachments referenced by the archival package the search engine Sphinx is used. First all attachments are converted to text, then Sphinx builds and index to facilitate efficient look-ups. The command line tool pdftotext is used to convert the PDF files to text. This tool must be avaliable in the path.
With insight running and after loading and indexing an archival package, it is possible to run SQL queries directly on the index with MySQL (version 5.6) client installed:
mysql -h0 -P9306
MySQL [(none)]> show tables;
+------------+-------+
| Index | Type |
+------------+-------+
| INDEX_NAME | local |
+------------+-------+
1 row in set (0.00 sec)
MySQL [(none)]> select i from INDEX_NAME where match('Drammen');
See Sphinx user manual for more information.
After import a PDF-report is generated it the report folder as configured by the REPORTS_DIR key in insight.conf. The reports are stored in a folder named REPORTS_DIR\yyyy\MM\DD\TTMMSS\. In the report folder the Sphinx index and similar data attached to the IP is stored.
For some XML based formats there can be a one to many relation between a node in the XML and files in the archival package. An example of this is the Norwegian Health Archive package where the avlxml.xml file can reference multiple digitized pages and corresponding OCR metadata. This relationship can be configured using the key INFO_VIEW_JOURNAL_TYPE_REGEXP in the import format file. Nodes matching this key will get a Journal button at the bottom of the node view. The Journal view allows users to select pages that should be exported.
The journal mode supports display and export of journals as searchable-PDF where each page consists of the digitized page (for example in JPEG format) an the recognized text (OCR) as an invisible layer.
Supported OCR formats are ALTO and HOCR. For more information how this mode works study the script pdf\create-pdf.cmd. To create PDFs several tools have to be installed and available in the system path:
The config file is named insight.conf. The goal of the config file is that it is self documenting, so inspect it for further details. If the config file is changed the application has to be restarted. Each import format has its own config file stored under formats. All files ending with .conf in this folder will be loaded at startup and displayed in the file open dialog.
insight -platform windows:dpiawareness=0
to get correct window size. Description of other parameters influencing user interface are documented here: https://doc.qt.io/qt-5/qguiapplication.html#supported-command-line-optionsTo support automated dissemination workflows Insight support batch mode using the command line parameters: –file, –file-format and –auto-export. The auto-export feature is useful if the format is configures to auto select nodes at import using the TREEVIEW_AUTO_SELECT_REGEXP key.
insight --file nha-sip.tar --file-format "Norsk Helsearkiv SIP" \
--auto-export out.pdf
This command will open the AIP file nha-sip.tar as a Norsk Helsearkiv SIP as defined by <./formats/nha-sip.conf> format file and export it to out.pdf, then exit the application. Full example here.
The import process is controlled by various keys in the format file:
; File patterns supported by the format, separated by '@'
IMPORT_FORMAT_PATTERNS=*.7z@*.tar
; Extraction tools, order must match pattern list above
EXTRACT_TOOL="^.*\.7z$@7z x -y \
-o%DESTINATION% %FILENAME%@^.*\.tar$@tar \
-C %DESTINATION% -xf %FILENAME%"
; Auto load nodes will be loaded into tree view when importing XML
INFOVIEW_AUTO_IMPORT_REGEXP_EN=filename[^.*avlxml\\.xml$]
Auto load nodes will use the first format matching the filename. For the example above this is <./formats/epjark.conf>. This format definition file uses the TREEVIEW_AUTO_SELECT_REGEXP key to auto-select nodes at import. Only selected nodes will be included in the export.
; Auto select nodes will be selected when importing XML
TREEVIEW_AUTO_SELECT_REGEXP=pasientjournal@diagnose
Please create a GitHub issue with a detailed as possible description of what happened. Attach log files and insight.dmp if it exists. Do not post sensitive material!
Upgraded to Qt 6.5
Replaced PDF library poppler with QPdfDocument
Replaced custom built pdftotext tool with precomiled tool
OS-X: Support dark mode
OS-X: Fix view relative folder
Upgraded to latest sphinx indexer
Random format: Bugfixes.
Optimize: Made reading & parsing of XMLs run in parallell.
Journal: Fixed crash if journal had no pages.
Bugs:
Pre release.
Pre release.
First beta test release.
The application is created using the Qt framework. When this is installed the application can be build using:
# Linux / OS-X
(cd src/thirdparty ; \
unzip quazip-1.4.zip ; \
cd quazip-1.4 ; \
cmake -S . -B ./out -D QUAZIP_QT_MAJOR_VERSION=6 ; \
cmake --build ./out)
./update-translations.sh
qmake
make
# Windows
cd src/thirdparty
unzip quazip-1.4.zip
cd quazip-1.4
cmake -S . -B ./out -D QUAZIP_QT_MAJOR_VERSION=6 -D CMAKE_LIBRARY_PATH=c:\dev\Piql\zlib-win64\Release
cmake --build ./out --config release
cmake --build ./out --config debug
./update-translations.sh
qmake
nmake
On some systems the MySQL driver has to be built and copied to distribution dir: https://doc.qt.io/qt-5/sql-driver.html#qmysql-for-mysql-5-and-higher
``` mkdir mysql cd mysql qt-cmake c:\Qt\6.5.0\Src\qtbase\src\plugins\sqldrivers \ -DCMAKE_INSTALL_PREFIX=c:\Qt\6.5.0\msvc2019_64 \ -DMySQL_INCLUDE_DIR="c:\Program Files\MySQL\MySQL Server 8.0\include" \ -DMySQL_LIBRARY="c:\Program Files\MySQL\MySQL Server 8.0\lib\libmysql.lib" ```
To create release packages use:
create-release-osx.sh
create-release.cmd
The tool pdftotext comes from the Xpdf command line tools pacakge downloaded from https://www.xpdfreader.com/download.html. The pdf2text.exe and config file xpdfrc must be in the path.
Intall with you favourite package manager and ensure tool is avaliable in path.
sudo apt install libpoppler*
sudo apt install libboost-all-dev
sudo apt install libquazip5-dev
sudo apt install qttools5-dev-tools
setup-win64.cmd
qmake -tp vc