Input/Output Organization Issues

mlesnick commented 6 years ago

RIVET has improved a lot in recent months, and I believe that that it is now in good shape for practical use. With that said, I believe RIVET might be more accessible to new users if the conventions for how data is input to/output from RIVET were reworked. While these conventions may have been cohesive to start, they have evolved in a somewhat ad hoc fashion as we have introduced new features to RIVET over the past couple of years. I think it would be good to take a careful look at them again.

(This is related to some existing issues, namely issue #134, issue #130, issue #120, issue #121.)

The issues I have in mind with how RIVET handles output are already summarized nicely the other issues linked above. Below are some places where the input conventions could perhaps use further attention.

Input: 1)Right now, for input in the form of a point cloud or finite metric space, the kind of bifiltration you associate to the data is hard-coded in the input file. In practice, one often wants to try several different bifiltrations (e.g., function-Rips with density function, degree-Rips, coeccentricity-Rips) with a single data data set. Trying these different options ought to be as easy as easy as feeding a .csv file to RIVET and changing one command line option for different runs. But right now, it's harder than this: For each of the three bifiltrations, you need to run a script (not currently included with RIVET) to construct three RIVET input files from the .csv. Moreover, the script has to compute the density functions or codensity functions on the vertices, as these are not computed natively by RIVET.

This creates an unnecessary barrier to exploring data with RIVET. The pyrivet API addresses this, which definitely helps, but I believe that the solution should be part of better input handling in RIVET itself.

2)The behavior of RIVET can be controlled in four different ways:

Command line options to rivet_console,
Specification of parameters in input files (e.g., axis labels, max distance parameter in a Rips bifiltration),
The file input dialogue in the GUI (this provides functionality equivalent to some but not all of the command line parameters for rivet_console),
The preferences window in the GUI (for certain visualization parameters).

For the most part, on a case-by-case basis, the decisions about where to put a particular RIVET option seem reasonable, but I would feel better about the design if there were some clear overarching principles guiding these choices. I believe that people getting to know the software for the first time may find it difficult to wrap their head around the different options and where they are located, especially since the way they are documented is (correspondingly) a bit scattered.

Here are some more concrete thoughts along these lines:

It's not clear to me that it is best to distinguish command line options from the parameters given in an input file. After all, some options (like max scale parameter, or choice of descending/ascending filtration for function-Rips bifiltration) would make equal sense in a file or from the command line (see issue #121). Axis labels, which are currently part of the required data in an input file, may be unnecessary for some computations that avoid use of the visualization altogether.

Is there some principled way to unify the two types of input? Perhaps everything could be a command line argument, and the user could have the option of putting command-line arguments in a file? If we could defer enough of the arguments of a RIVET input file to the command line that RIVET could comfortably handle an unadorned .csv file containing a point cloud or distance matrix, that would be great. If this is not what we want to do, is there anyway to make our conventions more transparent / accessible?

3)Along similar lines, it would be nice if there were a way to yolk the options for the file input dialogue more tightly to the command-line options for RIVET. In my experience, the file input dialogue makes using RIVET's visualization much easier; it is a valuable feature. But from the development end, it is a bit cumbersome, because whenever a new command-line feature is implemented, there is the additional burden of having to decide whether to add it to the input dialogue. For example, should I be able to control the number of processors used in parallel minimization of the presentation from the file input dialogue? The problem will be further compounded when we introduce options to build several bifiltrations from a single data set, as suggested in 1). Do we want to introduce special GUI elements for this in the file input dialogue? This makes me wonder whether the GUI options in the input dialogue for choosing homology dimension and number of bins should be replaced with a text window to enter arbitrary command line options for rivet_console, together with some way to display the -h file which explains how to use these options.

4)Taking this idea one step further, perhaps following structure would be cleaner:

The file input dialogue becomes a simple standalone wrapper to control rivet_console, and has exactly the same functionality as rivet_console, just in GUI.
rivet_console gets a command line option to start the visualization, for suitable input. (Perhaps this would be the default.) -The visualization program rivet_GUI is stripped of the file input dialogue; it just opens module invariant files, and perhaps takes a signal from RIVET console telling it when then the augmented arrangement has been added to the module invariant file.

delooper commented 6 years ago

I've always like the Persistence of Vision model. http://www.povray.org/

They allow the input script file to configure pretty much every option. You can then override those settings on the command line. I'm not certain if you want to go that far, as the command line parser might become complicated. But it's a reliable and pleasant way to do things for the end-user.

mlesnick commented 3 years ago

In v1.1 , we've addressed the issues with input to our satisfaction. The issues with output are mentioned in various other issues, so I will close this.

rivetTDA / rivet

Input/Output Organization Issues #138