uchicago-library / attachment-converter

Attachment Converter: tool for batch converting attachments in an email mailbox
GNU General Public License v2.0
8 stars 3 forks source link

Research into unit testing that the destination file format is as intended #78

Open bufordrat opened 1 year ago

bufordrat commented 1 year ago

We would eventually like to be able to run a test suite that will make sure the data Attachment Converter outputs matches the MIME type specified for the relevant conversion in the config file.

For example, if the config file says that Attachment Converter should convert a Word .doc to plaintext, we would like to run a file format detection utility on the output to determine that it is indeed plaintext rather than some other format, such as .pdf.

Utilities for identifying file formats

Matt's current top pick for a file format identification utility is:

This seems promising, insofar as it exists not only as a shell utility but as an OCaml library. It is also, coincidentally, authored by the same developer who gave us Mr. Mime, which is one of our two email parsing backends.

Some more mainstream utilities for identifying file types include:

For this issue, there are two goals.

Goal 1: brainstorm a set of unit tests

First, write a up some ideas for unit tests---could be 1 or 2, but possibly more---that we could run to try to catch bugs like the file format bug described above. Those ideas can go into a markdown (or org) file in the doc/ directory of this repository.

Goal 2: implement a simple cram test

We haven't tried running Cram tests yet, but apparently dune has the ability to do that. Before we start implementing actual Cram tests, let's try writing a trivial one, say that running attc --help prints the following help message:

> attc --help
ATTC(1)                           Attc Manual                          ATTC(1)

NAME
       attc - Converts email attachments.

SYNOPSIS
       attc [OPTION]… [ARG]

OPTIONS
       --config=PATH
           Sets the absolute path PATH to be checked for a configuration file.

       -r, --report
           Provides a list of all attachment types in a given mailbox.

       --report-params
           Prints a list of all MIME types in the input along with all header
           and field parameters that go with it.

       --single-email
           Converts email attachments assuming the input is a single plain
           text email.

COMMON OPTIONS
       --help[=FMT] (default=auto)
           Show this help in format FMT. The value FMT must be one of auto,
           pager, groff or plain. With auto, the format is pager or plain
           whenever the TERM env var is dumb or undefined.

EXIT STATUS
       attc exits with:

       0   on success.

       123 on indiscriminate errors reported on standard error.

       124 on command line parsing errors.

       125 on unexpected internal errors (bugs).

Attc                                                                   ATTC(1)

Here is a guide to writing a Cram test using dune: https://dune.readthedocs.io/en/latest/tests.html#cram-tests

Once we have the world's simplest Cram test working, we can flesh our test suite out with a wider range of tests.