Task: create a progress bar for Attachment Converter.
Background
So out of the box, our initial version of Attachment Converter isn't exactly the fastest performing application in the world. Some of our initial tests took about a minute to convert all the emails in an mbox with a total of 10-15 attachments.
Our major bottleneck is caused by the fact that Attachment Converter calls out to external applications to perform its conversions. We haven't done much benchmarking, but one reasonable starting assumption is that LibreOffice, which we are currently using heavily to convert to PDF-A, takes a while to do each one.
There are a couple "low hanging fruit" tactics we can take to lessen the runtime of the application on mbox-es that of a realistic size:
automatically parallelizing the conversions (this is theoretically possible because none of the conversions logically depend on any of the others)
finding faster utilities to perform specific conversions
We will continue to explore those options as we work on the project. That said, no matter how many of these approaches to speedings things up we end adopting, it seems pretty clear that at least for large mbox-es we will need to be prepared for a full round of attachment conversions to take a while.
Given the apparent inevitability of some amount of slowness, Attachment Converter will need to send some indication to the user of what is happening.
Progress Bar
Getting the progress bar to output useful information to the user while also outputting its actual data to standard out involves a little finessing of UNIX terminals and file handles.
Before we get into that, let's outline what information should be in the progress bar.
Layout
For this initial version, what we're calling a "progress bar" will just be a printed line of information with something like the following format:
converting <ATTACHMENT-FILENAME> to <ATTACHMENT-BASENAME>.<TARGET-EXTENSION> ...
It should print that line of information just before it begins each conversion, so that if that conversion takes five seconds, the user will see that line of information on the bottom of the screen for five seconds.
How to do it
Broadly, we want to:
open the current tty as a file handle and print the progress bar text to that
print the software's actual output to standard out
That approach will display both the output data and the progress bar messages intermixed at the same time, in the terminal. Really, what we want is an either/or situation:
print the progress bar if standard out is being redirected to a file
if standard out is not being redirected, print that instead
The following UNIX hijinks should give us that result:
do a systemcall which asks, of standard out, "is this a tty"?
if the answer is yes, that means the output is not being redirected, and we print standard out with no progress bar
if the answer is no, that means the output is being redirected (which means it won't be printed to the console), and we print the progress bar to the tty
The Stdlib.Unix module provides a pretty comprehensive interface to UNIX system calls. You can use Unix.isatty to check whether a device is a tty. Since this take a Unix file descriptor as an input (rather than an input channel), use Unix.stdout rather than Pervasives.stdout.
Create (first version of) progress bar
Task: create a progress bar for Attachment Converter.
Background
So out of the box, our initial version of Attachment Converter isn't exactly the fastest performing application in the world. Some of our initial tests took about a minute to convert all the emails in an
mbox
with a total of 10-15 attachments.Our major bottleneck is caused by the fact that Attachment Converter calls out to external applications to perform its conversions. We haven't done much benchmarking, but one reasonable starting assumption is that LibreOffice, which we are currently using heavily to convert to PDF-A, takes a while to do each one.
There are a couple "low hanging fruit" tactics we can take to lessen the runtime of the application on
mbox
-es that of a realistic size:We will continue to explore those options as we work on the project. That said, no matter how many of these approaches to speedings things up we end adopting, it seems pretty clear that at least for large
mbox
-es we will need to be prepared for a full round of attachment conversions to take a while.Given the apparent inevitability of some amount of slowness, Attachment Converter will need to send some indication to the user of what is happening.
Progress Bar
Getting the progress bar to output useful information to the user while also outputting its actual data to standard out involves a little finessing of UNIX terminals and file handles.
Before we get into that, let's outline what information should be in the progress bar.
Layout
For this initial version, what we're calling a "progress bar" will just be a printed line of information with something like the following format:
It should print that line of information just before it begins each conversion, so that if that conversion takes five seconds, the user will see that line of information on the bottom of the screen for five seconds.
How to do it
Broadly, we want to:
tty
as a file handle and print the progress bar text to thatThat approach will display both the output data and the progress bar messages intermixed at the same time, in the terminal. Really, what we want is an either/or situation:
The following UNIX hijinks should give us that result:
tty
"?tty
The
Stdlib.Unix
module provides a pretty comprehensive interface to UNIX system calls. You can useUnix.isatty
to check whether a device is atty
. Since this take a Unix file descriptor as an input (rather than an input channel), useUnix.stdout
rather thanPervasives.stdout
.