petermr / ami3

Integration of cephis and normami code into a single base. Tests will be slimmed down

Apache License 2.0

17 stars 5 forks source link

Reduce shared `ami` options #23

Closed remkop closed 4 years ago

remkop commented 4 years ago

Problem Description

(Raised by @petermr in this comment on #13)

All current ami tools share about 20 common options. Additionally some commands have some command-specific options. This is too many: it makes the usage help message difficult to grasp. Not all common options apply to all commands.

Analysis

I believe there are two aspects to this (@petermr, correct me if I'm wrong):

Some options are "shared" but do not really apply to all commands. This may be because all commands inherit from AbstractAMITool, where these options are defined.
Some commands actually do use all these options, but casual users only use a few of them.

Solutions

Idea 1: move real shared options to the `ami` top-level command

This requires some analysis on which options are really applicable to/used by all (or the vast majority of) commands. These options could then be moved to the ami top-level command.

Invoking ami --help would show these shared options, while ami <cmd> --help would only show the command-specific options.

Command invocations would then look like this:

ami --shared-option1 --shared-option2 <cmd> --cmd-specific-option1 ...

This would hook nicely into our idea of creating workflows, because shared options would only need to be specified once on the top-level command instead of on each command:

ami --shared-option1 --shared-option2 \
  <cmd1> --cmd1-specific-option \
  <cmd2> --cmd2-specific-option \
  <cmd3> --cmd3-specific-option1 --cmd3-specific-option2

Idea 2: use mixins instead of inheritance

Picocli offers mixins as an alternative reuse mechanism to Java inheritance. I we find there are some groups of options that apply to several, but not all, commands, then these options could be split off into a separate class, and "mixed in" to the commands where they are actually used with the @Mixin annotation.

Again, this requires some analysis on which options are applicable to/used by each command.

Idea 3: custom help

If there are some commands that still have too many options, we can give these commands a custom usage help message, where ami <cmd> --help would only show the "often used" options, and ami <cmd> --help-details would show the full list of all options.

This would require some guidance from experienced users on which options fall into the "often used" category, and which are "rare use case" options.

remkop commented 4 years ago

Note: I can go ahead with Idea 1 and 2. After that, we can take a look if Idea 3 is still necessary and if so, I will need some guidance. If no objection I will start working on the first 2 ideas.

remkop commented 4 years ago

Here is my analysis on which options are used by which tools, with proposed plan of action. Feedback welcome!

1. Truely Common Options

The following command line options are applicable to all or most tools.

If no objections, I will move all these common options out of the subcommands and into the ami top-level command for now.

Later, we can also allow these common options to be specified on all ami subcommands (in addition to the ami top-level command), but that would mean that these options do show up on the usage help message for all ami subcommands. Unsure at this stage whether that is desirable.

Base, Tree and Input-related options

cProjectDirectory - AMICleanTool, AMISummaryTool, AMITableTool, AbstractAMITool
cTreeDirectory - AMIMakeProject, AbstractAMITool
excludeBase - AbstractAMITool
includeBase - AbstractAMITool
excludeTrees - AbstractAMITool
includeTrees - AbstractAMITool
maxTreeCount - AbstractAMITool
input - AMIRegexTool, AMIDictionaryTool, AMITransformTool, AbstractAMITool
inputBasename - AMIAssertTool, AMIDisplayTool, AMIForestPlotTool, AMIImageTool, AMIOCRTool, AMIPixelTool, AbstractAMITool, ImageDirProcessor, GOCRConverter
inputBasenameList - AMIForestPlotTool, AMIImageTool, AMIOCRTool, AMIPixelTool (all via ImageDirProcessor)
forceMake - AMIGrobidTool, AMIPdfTool, AMISectionTool, AMIOCRTool (via GOCRConverter, HOCRConverter)

Logging

log4j - AMIDictionaryTool, AbstractAMITool
verbosity - AMIGraphicsTool, AMIImageTool, AMIPixelTool, AMISearchTool, AbstractAMITool

2. Options used in only 1 or 2 tools

I propose to move these options out of the shared options and into the tools where they are used.

[x] rawFileFormats - AMIDownloadTool, AMIMakeProject
[x] subdirectoryType - AMIAssertTool
[x] testString - AMIDictionaryTool
[x] output - AMIRegexTool, AMIDownloadTool
[x] logfile - AMIMakeProject, AMISVGTool (this is separate from the log4j logging options)
[x] oldstyle - AMIRegexTool, AbstractAMISearchTool

3. Options that are never used

I propose to remove these options. They can be re-introduced when necessary.

outputBasename - never used
dryrun - never used

petermr commented 4 years ago

Remko, so sorry for not having picked this up earlier...

On Mon, Apr 6, 2020 at 3:27 AM Remko Popma notifications@github.com wrote:

Problem Description

(Raised by @petermr https://github.com/petermr in this comment https://github.com/petermr/ami3/issues/13#issuecomment-609479338 on #13 https://github.com/petermr/ami3/issues/13)

All current ami tools share about 20 common options. Additionally some commands have some command-specific options. This is too many: it makes the usage help message difficult to grasp. Not all common options apply to all commands. Analysis

I believe there are two aspects to this (@petermr https://github.com/petermr, correct me if I'm wrong):

Some options are "shared" but do not really apply to all commands. This may be because all commands inherit from AbstractAMITool, where these options are defined.

Some commands actually do use all these options, but casual users only use a few of them.

Perfect analysis. Add minor comments

Some options were created in an ad hoc fashion (possibly during the learning process)

some options may overlap with others (this may be true for input/ouput files/streams)

Solutions Idea 1: move real shared options to the ami top-level command

This requires some analysis on which options are really applicable to/used by all (or the vast majority of) commands. These options could then be moved to the ami top-level command.

Invoking ami --help would show these shared options, while ami --help would only show the command-specific options.

Command invocations would then look like this:

ami --shared-option1 --shared-option2 --cmd-specific-option1 ...

This would hook nicely into our idea of creating workflows, because shared options would only need to be specified once on the top-level command instead of on each command:

ami --shared-option1 --shared-option2 \
--cmd1-specific-option \ --cmd2-specific-option \ --cmd3-specific-option1 --cmd3-specific-option2 Good idea. Not sure how reusable the options are. For example inputs for one cmd become irrelevant and outputs are often inputs lower down the workflow Idea 2: use mixins instead of inheritance Picocli offers mixins as an alternative reuse mechanism to Java inheritance. I we find there are some groups of options that apply to several, but not all, commands, then these options could be split off into a separate class, and "mixed in" to the commands where they are actually used with the @Mixin annotation. Again, this requires some analysis on which options are applicable to/used by each command. Again good idea. I doubt there are clusters of cmds which create higher level mixin groups but we should use them when there are. Idea 3: custom help If there are some commands that still have too many options, we can give these commands a custom usage help message, where ami --help would only show the "often used" options, and ami --help-details would show the full list of all options. This would require some guidance from experienced users on which options fall into the "often used" options, and which are "rare use case" options.

Agreed. I think we should probably use all three.

One more radical approach is to have an external declarative approach where we have configuration files for cmds and can edit these. It might also make it easier to compute logic over them (combinations of allowed and disallowed options)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/ami3/issues/23, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS7HW7G5A5LHNR2KYXLRLE4ZRANCNFSM4MBZODSA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 4 years ago

On Fri, Apr 10, 2020 at 3:49 AM Remko Popma notifications@github.com wrote:

Here is my analysis on which options are used by which tools, with proposed plan of action. Feedback welcome!

Excellent analysis.

Truely Common Options

The following command line options are applicable to all or most tools.

If no objections, I will move all these common options out of the subcommands and into the ami top-level command for now.

Later, we can also allow these common options to be specified on all ami subcommands (in addition to the ami top-level command), but that would mean that these options do show up on the usage help message for all ami subcommands. Base, Tree and Input-related options

cProjectDirectory - AMICleanTool, AMISummaryTool, AMITableTool, AbstractAMITool

The most common command involves iterating over a CProject, or a subset of CTrees . However there are some like AMIDictionary which are really conceptually separate. Indeed managing dictionaries is something we have to get to grips with. Maybe we can refer to these as CProjectIterators.

excludeBase - AbstractAMITool

includeBase - AbstractAMITool

excludeTrees - AbstractAMITool

includeTrees - AbstractAMITool

maxTreeCount - AbstractAMITool

I'd expect these to be useful in any CProjectIterators

input - AMIRegexTool, AMIDictionaryTool, AMITransformTool, AbstractAMITool

Regex and Transform are essential parts but haven't been used much recently. They are really CProjectIterators . Regex used to work and will come back as soon as we start doing complex analyses. It might even become a subcommand of AMISearch. Transform is probably used by higher level commands

inputBasename - AMIAssertTool, AMIDisplayTool, AMIForestPlotTool, AMIImageTool, AMIOCRTool, AMIPixelTool, AbstractAMITool, ImageDirProcessor, GOCRConverter

The input and base concepts probably need reviewing. There's a tension between filenames, streams, and generic input names

forceMake - AMIGrobidTool, AMIPdfTool, AMISectionTool, AMIOCRTool (via GOCRConverter, HOCRConverter)

This is probably valuable on every CProject tool

Logging

log4j - AMIDictionaryTool, AbstractAMITool

verbosity - AMIGraphicsTool, AMIImageTool, AMIPixelTool, AMISearchTool, AbstractAMITool

I have never got on top of logging properly! I don't like the use of log4j.properties. It's very crude, only settable at compile and opaque to users. It would be very nice to have -v, -vv settable so it changed the level of output. I started a hack to try that at one stage . And it could be on a per Tool basis.

Options used in only 1 or 2 tools

I propose to move these options out of the shared options and into the tools where they are used.

cTreeDirectory - AMIMakeProject

rawFileFormats - AMIDownloadTool, AMIMakeProject

subdirectoryType - AMIAssertTool

testString - AMIDictionaryTool

inputBasenameList - ImageDirProcessor

output - AMIRegexTool, AMIDownloadTool

logfile - AMIMakeProject, AMISVGTool (this is separate from the log4j logging options)

I'm not very good at distinguishing between logging for debugging and logging for record. I did the latter at one stage (for PhyloTree I think) where we needed the scientific output in processable form. This is partly overcome by the CProject tree, where we can record in multiple subdirectories, e.g.

oldstyle - AMIRegexTool, AbstractAMISearchTool

This is a horrible kludge. It's my oldstyle pre-picocli command line which is used in the depths of AMISearch. It is why the output of search is messy. I started trying to clean it. It's probably 2-3 days of hacking. It will make searching much more reusable, probably regenerate regex, etc.

Options that are never used

I propose to remove these options. They can be re-introduced when necessary.

These are YAGNI 's that were part of a grander design.

outputBasename - never used

dryrun - never used

We'll ditch them.

BTW in general any changes will only affect a small number of people . People who've been involved in workshops, etc will be using a fixed version of AMI. I don't think anyone will complain if the message "outputBasename has been deprecated / superseded by ..." is emitted

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/ami3/issues/23#issuecomment-611850886, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS7ESI7DZYSDT5Z3HUTRL2CJZANCNFSM4MBZODSA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

remkop commented 4 years ago

Thanks for the detailed feedback! Overall no issues with my proposed changes so I’ll go ahead with these over the weekend.

Understood that further improvements can be made with regards to streamlining input-related options and logging. I recently did some work on configuring Log4j from a verbosity option so I can probably reuse that here.

remkop commented 4 years ago

Quick question (Q1): several option descriptions start with (A). Is this to indicate these are shared (AMI?) options? Can this description prefix be removed for shared options that are moved to the top-level ami command?

remkop commented 4 years ago

Another quick question (Q2): the -t, --ctree[=CTree] option has an optional parameter: [CTree] parameter is optional because the option is defined with arity = "0..1". Is this intentional?

Currently, if a user specifies ami --ctree -v ... (without a CTree parameter), then picocli assigns an empty String "" to the cTreeDirectory field, but I cannot see any logic that expects an empty string...

Should this not be -t, --ctree=CTree (changing to arity = "1", so that if -ctree is specified then users must supply a CTree value)?

petermr commented 4 years ago

This was a crude attempt to signal the shared options . One of the potential benefits of knowing where something is defined (in principle, although not well implemented) is that the AbstractAMITest could (should) contain tests for that level and help explain what the parameters do. It was a crude signal that some of these were options due to general inheritance rather than being useful. i.e. "if you don't understand these don't worry"

I'm coming to the conclusion that the bulk inheritance is contaminating the help. WE probably need a bespoke set these commands for each level . I'm keen on mixins - I only adopted it rather late. If we had a global table of Options by Command then we could see if there were places where a bundle was used. This is similar to attributes in XML languages where bundles are created that apply to a number of elements.

For interest, is there a way of generating Option values on the fly? For example the default of (say) an AmIImage option might depend on the properties of the CProject. More generally I'm ending up with some messy logic (especially for inputs) such as if (input == null && terms != null) ... where input and terms are both Options.

but maybe the code needs yet another refactor! It's impossible to design this system up front because you don't have even a crude spec for the input.

On Sat, Apr 11, 2020 at 7:12 AM Remko Popma notifications@github.com wrote:

Quick question: several option descriptions start with (A). Is this to indicate these are shared (AMI?) options? Can this description prefix be removed for shared options that are moved to the top-level ami command?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/ami3/issues/23#issuecomment-612341249, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS76NMDEPHCDVFVSCNDRMAC3FANCNFSM4MBZODSA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

remkop commented 4 years ago

Question Q3: are --cproject and --ctree mutually exclusive options? Should we use picocli's support for exclusive options to let the user know that input is invalid when both --cproject and --ctree are specified?

Also, should we make --excludetree and --includetree conditional on --cproject? (So picocli shows an error if the user specifies these options without specifying --cproject?

Similarly, should we make --excludebase and --includebase conditional on --ctree?

remkop commented 4 years ago

To expand on Q3, that would make the synopsis look something like the below. Is that what you have in mind?

ami [-p [--excludetree | --includetree]] | [-t [--excludebase | --includebase]] COMMAND

petermr commented 4 years ago

On Sat, Apr 11, 2020 at 9:22 AM Remko Popma notifications@github.com wrote:

Another quick question (Q2): the -t, --ctree[=CTree] option has an optional parameter ([CTree] parameter is optional because the option is defined with arity = "0..1"). Is this intentional?

I think not.

At one stage I thought of allowing a list so that we can input a set of files (and the current tasks could benefit from it) . The current approach (which I had forgotten!) is to use include --includetree and --excludetree . But they require a CProject. So: -p foo --includetree a z makes sense. BUT we have to have a project (normally created by a scraper or already on disk as a directory. But now we are getting pages which point to files and URLs that are not part of a project . There's a half baked construct in AMIDictionary that reads an input from any of:

URL
absolute file name
relative filename
java resource and I think I need to make a class for this, something like:
```
--inputstream https://biorxiv.org/foo.xml /Users/pm286/bar.txt
../plugh.html /org.contentmine.data/y2.xml
```
These cannot be in a project (yet). Are there generic tools already for this sort of thing?

I am about to face this today as I have more or less cracked the cascade on biorxiv. and can read most of the files. The bug I mentioned earlier was not actually important - it is another (early temination) bug.

Currently, if a user specifies ami --ctree -v ... (without a CTree parameter), then picocli assigns an empty String "" to the cTreeDirectory field, but I cannot see any logic that expects an empty string...

Should this not be -t, --ctree=CTree (changing to arity = "1", so that if -ctree is specified then users must supply a CTree value)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/ami3/issues/23#issuecomment-612365114, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS2WNKVB3XQPDNKDC6DRMASEBANCNFSM4MBZODSA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

remkop commented 4 years ago

But now we are getting pages which point to files and URLs that are not part of a project . There's a half baked construct in AMIDictionary that reads an input from any of:

URL

absolute file name

relative filename

java resource

and I think I need to make a class for this, something like:
--inputstream https://biorxiv.org/foo.xml /Users/pm286/bar.txt
../plugh.html /org.contentmine.data/y2.xml
These cannot be in a project (yet). Are there generic tools already for this sort of thing?

Yes, I think you can simply use a list or array of java.net.URI values for the --inputstream option. Something like this:

@Option(names = "--inputstream")
List<java.net.URI> uri;

You will then need to write some logic to find out what kind of URI(s) the user supplied.

If the value has a scheme (e.g., it starts with http:), then the URI can be converted to a URL, and and you can call URL::openStream on it. Otherwise, it may be a relative path, an absolute path, or a resource path.
You can test if it is an existing resource by calling URL url = AMIDictionary.class.getResource(uri.toString()) and checking if the result is null. null means the resource was not found in the classpath.
Finally, if it is not a resource path, you can treat it as a file path.

petermr commented 4 years ago

Question Q3: are --cproject and --ctree mutually exclusive options?

I'm pretty sure they are from the logic point of view. However a CTree needs to know what project it's in and there's a method org.contentmine.cproject.files.CTree.getOrCreateProject()

/** create a project for the tree.
* if null, locates parent directory and create project from that
* no current check as to whether more than one project, so use with care.
* (ideally should check all siblings)
* i.e. if re-used carelessly might create different Project with different resources will set this project
* *@return*
*/
public CProject getOrCreateProject() {
if (this.cProject == null) {
  File file = getDirectory();
  if (file != *null*) {
    File parentFile = file.getParentFile();
      if (parentFile != *null*) {
        this.cProject = *new* CProject(parentFile);
      }
    }
  }
  return cProject;
}

So there's a very clear indication that CTrees only exist in a CProject

Should we use picocli's support for exclusive options https://picocli.info/#_mutually_exclusive_options to let the user know that input is invalid when both --cproject and --ctree are specified?

I'm tempted to say yes. At worst it will flush out places which demand otherwise.

Also, should we make --excludetree and --includetree conditional on --cproject? (So picocli shows an error if the user specifies these options without specifying --cproject?

Yes. This is very helpful - it's easy to think of a CTree as an input stream whereas it's a mutable storage.

Similarly, should we make --excludebase and --includebase conditional on --ctree?

Yes. We have a recursion of thinking in much of this. CProject .. CTree .... pdfimages ...... image1.png ...... image2.png

and where image1 should be included and image2 excluded. (This is what will happen with Matthew's battery project)

—

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/ami3/issues/23#issuecomment-612369187, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZQ26K52LYN4A5SSRLRMAU7DANCNFSM4MBZODSA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr commented 4 years ago

On Sat, Apr 11, 2020 at 10:16 AM Remko Popma notifications@github.com wrote:

Yes, I think you can simply use a list or array of java.net.URI https://docs.oracle.com/javase/7/docs/api/java/net/URI.html values for the --inputstream option. Something like this:

@Option(names = "--inputstream")List uri;

You will then need to write some logic to find out what kind of URI(s) the user supplied.

If the value has a scheme (e.g., it starts with http:), then the URI can be converted to a URL, and and you can call URL::openStream on it. Otherwise, it may be a relative path, an absolute path, or a resource path.

You can test if it is an existing resource by calling URL url = AMIDictionary.class.getResource(uri.toString()) and checking if the result is null. null means the resource was not found in the classpath.

Finally, if it is not a resource path, you can treat it as a file path.

Brilliant. I already use this sort logic (somewhere!) . I'll make it a separate AMIInputStream or similar so it's easy to locate and gradually introduce it. (Unfortunately my brain is full so it's difficult to remember what IO have already written and where!)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/ami3/issues/23#issuecomment-612374849, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSYX4OPF7OXLLVNXT5LRMAYQRANCNFSM4MBZODSA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

remkop commented 4 years ago

Update: I have a local version with the above changes. I have not pushed yet because this will probably break many tests. I will try to look at the tests, unsure how much time I have for that today.

Usage Help

Many options are now no longer in every tool, but moved to the top-level ami tool. Some options were only used in a few tools, and were move to the tools where they were used.

The updated help for ami will look like this:

Usage: ami [OPTIONS] COMMAND

`ami` is a command suite for managing (scholarly) documents: download, aggregate, transform, search, filter, index,
annotate, re-use and republish.
It caters for a wide range of inputs (including some awful ones), and creates de facto semantics and an ontology (based
on Wikidata).
`ami` is the basis for high-level science/tech applications including chemistry (molecules, spectra, reaction), Forest
plots (metaanalyses of trials), phylogenetic trees (useful for virus mutations), geographic maps, and basic plots (x/y,
scatter, etc.).

Parameters:
===========
      [@<filename>...]       One or more argument files containing options.
Options:
========
  -h, --help                 Show this help message and exit.
  -V, --version              Print version information and exit.
CProject Options:
  -p, --cproject=DIR         The CProject (directory) to process. This can be (a) a child directory of cwd (current
                               working directory (b) cwd itself (use -p .) or (c) an absolute filename. No defaults.
                               The cProject name is the basename of the file.
  -r, --includetree=DIR...   Include only the CTrees in the list. (only works with --cproject). Currently must be
                               explicit but we'll add globbing later.
  -R, --excludetree=DIR...   Exclude the CTrees in the list. (only works with --cproject). Currently must be explicit
                               but we'll add globbing later.
CTree Options:
  -t, --ctree=DIR            The CTree (directory) to process. This can be (a) a child directory of cwd (current
                               working directory, usually cProject) (b) cwd itself, usually cTree (use -t .) or (c) an
                               absolute filename. No defaults. The cTree name is the basename of the file.
  -b, --includebase=PATH...  Include child files of cTree (only works with --ctree). Currently must be explicit or with
                               trailing percent for truncated glob.
  -B, --excludebase=PATH...  Exclude child files of cTree (only works with --ctree). Currently must be explicit or with
                               trailing percent for truncated glob.
General Options:
  -i, --input=FILE           Input filename (no defaults)
  -n, --inputname=PATH       User's basename for inputfiles (e.g. foo/bar/<basename>.png) or directories. By default
                               this is often computed by AMI. However some files will have variable names (e.g. output
                               of AMIImage) or from foreign sources or applications
  -L, --inputnamelist=PATH...
                             List of inputnames; will iterate over them, essentially compressing multiple commands into
                               one. Experimental.
  -f, --forcemake            Force 'make' regardless of file existence and dates.
  -N, --maxTrees=COUNT       Quit after given number of trees; null means infinite.
Logging Options:
  -v, --verbose              Specify multiple -v options to increase verbosity. For example, `-v -v -v` or `-vvv`. We
                               map ERROR or WARN -> 0 (i.e. always print), INFO -> 1(-v), DEBUG->2 (-vv)
      --log4j=(CLASS LEVEL)...
                             Customize logging configuration. Format: <classname> <level>; sets logging level of class,
                               e.g.
                              org.contentmine.ami.lookups.WikipediaDictionary INFO
Commands:
=========
  assert               Makes assertions about objects created by AMI.
  clean                Cleans specific files or directories in project.
  dictionary           Manages AMI dictionaries.
  display              Displays files in CTree.
  download             Downloads content from remote site.
  dummy                Minimal AMI Tool for editing into more powerful classes.
  filter               FILTERs images (initally from PDFimages), but does not transform the contents.
  forest               Analyzes ForestPlot images.
  getpapers            Runs getpapers in java environment.
  graphics             Transforms graphics contents (often from PDF/SVG).
  grobid               Runs grobid.
  image-filter         FILTERs images (initally from PDFimages), but does not transform the contents.
  image                Transforms image contents but only provides basic filtering (see ami-filter).
  makeproject          Processes a directory (CProject) containing files (e.g.*.pdf, *.html, *.xml) to be made into
                         CTrees.
  metadata             Manages metadata for both CProject and CTrees.
  ocr                  Extracts text from OCR and (NYI) postprocesses HOCR output to create HTML.
  pdf                  Convert PDFs to SVG-Text, SVG-graphics and Images.
  pixel                Analyzes bitmaps - generally binary, but may be oligochrome.
  regex                Searches with regex.
  search               Searches text (and maybe SVG).
  section              Splits XML files into sections using XPath.
  summary              Summarizes the specified dictionaries, genes, species and words.
  svg                  Takes raw SVG from PDF2SVG and converts into structured HTML and higher graphics primitives.
  table                Writes cProject or cTree to summary table.
  transform            Runs XSLT transformation on XML (NYFI).
  words                Analyzes word frequencies.
  help                 Displays help information about the specified command
  generate-completion  Generate bash/zsh completion script for ami.

Validation

The latest version makes certain option combinations invalid:

ami -p project/dir -t tree/dir [...]

Error: [[-p=DIR] [-r=DIR... [-r=DIR...]... | -R=DIR... [-R=DIR...]...]] and [[-t=DIR] [-b=PATH... [-b=PATH...]... | -B=PATH... [-B=PATH...]...]] are mutually exclusive (specify only one)
Usage: ami [OPTIONS] COMMAND
Try 'ami --help' for more information.

ami -p=project/path --includetree=tree/1 --excludetree=tree/2

Error: --excludetree=DIR, --includetree=DIR are mutually exclusive (specify only one)
Usage: ami [OPTIONS] COMMAND
Try 'ami --help' for more information.

ami -t=tree/path --includebase=base/1 --excludebase=base/2

Error: --excludebase=PATH, --includebase=PATH are mutually exclusive (specify only one)
Usage: ami [OPTIONS] COMMAND
Try 'ami --help' for more information.

Unfortunately, if you specify --cproject with an option that only applies to --ctree, the error message is not super clear...

ami -p project/dir --includebase=tree/base [...]

Error: [[-p=DIR] [-r=DIR... [-r=DIR...]... | -R=DIR... [-R=DIR...]...]] and [[-t=DIR] [-b=PATH... [-b=PATH...]... | -B=PATH... [-B=PATH...]...]] are mutually exclusive (specify only one)
Usage: ami [OPTIONS] COMMAND
Try 'ami --help' for more information.

remkop commented 4 years ago

I also gave many ami options a short mnemonic option name. For common options this allows expert users to do their work with less typing. We should look into doing that for other commands as well (I also did grobid).

Note: a common convention on unix is to give most options a short mnemonic option name, but deliberately give only a long name (and no short mnemonic name) to options that are "dangerous" or you don't want to encourage casual users to use. (Not sure how applicable that is to ami commands, but something to bear in mind.)

I updated my previous comment with the new ami help usage message; please view it on GitHub - the email message you received only shows the (now out-of-date) original comment and you won't get notifications for subsequent updates to GitHub comments.

remkop commented 4 years ago

Pushed to master.

remkop commented 4 years ago

Closing; I don't think there is any work remaining here.

petermr / ami3

Reduce shared `ami` options #23

Problem Description

Analysis

Solutions

Idea 1: move real shared options to the ami top-level command

Idea 2: use mixins instead of inheritance

Idea 3: custom help

1. Truely Common Options

Base, Tree and Input-related options

Logging

2. Options used in only 1 or 2 tools

3. Options that are never used

Usage Help

Validation

Idea 1: move real shared options to the `ami` top-level command