Generalize the Timeseries Wizard HPC Parameters page and other wizard improvements - Githubissues

sandialabs / slycat

Web-based data science analysis and visualization platform.

http://slycat.readthedocs.org

Other

75 stars 19 forks source link

Generalize the Timeseries Wizard HPC Parameters page and other wizard improvements #720

Open alexsielicki opened 7 years ago

alexsielicki commented 7 years ago

Change field labels on the HPC Parameters page to make them more general.

From @wlhunt, here's an initial set of changes: -the WCID field should be called “Account ID” -the Partition field should be called “Partition/Queue”

wlhunt commented 7 years ago

In the Select Table File page of the wizard, the path field is not marked and is not obvious that it's a data entry field where the path to the target file is to be entered. Can this be made more clear with a label or different interface presentation or styling?

Also at this stage of the wizard, the phrase "Please select your table file" could be improved. A "table file" is not understood by many people. I suggest this replacement: "Please select the file which contains the metadata for the timeseries".

wlhunt commented 7 years ago

In the Select Timeseries File page of the wizard, the path entered from the Select Table File is not carried forward to the path field in this stage. The user would appreciate that path being copied forward.

wlhunt commented 7 years ago

In the HPC Parameters page of the wizard, a brief help or instruction message would help. Something like:

"Slycat will process these timeseries files on this cluster and transfer the model data to the server. Please enter the HPC system & scheduler info below."

The Number Of Nodes field is limited to only 1 at some deployments. Consider allowing the server config to dictate the max value of this field.

I don't think the user should be asked to give a number of cores. This implies that the user is familiar with the compute and memory requirements of the timeseries processing code and can make that determination. The field should be removed and the value should be fixed for each deployment site or dynamically set by slycat code at the time of job launch.

As mentioned above, a better generic name for the WCID field is "Account" and "Partition/Queue" is better for the Partition field.

Maximum Time should be "Requested Job Time".

The Working Directory field label should be a little more verbose, something like: "Working Directory for Temporary Slycat Files"

pjcross commented 7 years ago

Let's talk about this at the meeting tomorrow.

pjcross commented 7 years ago

Also, we need to prevent users from logging into a machine that is not an HPC and we need to let them know that the CSV needs to point to a cluster machine for pulling in the time series files (I don't think it will work if the cluster machine they are running on and the CSV file pointer are different).

pjcross commented 7 years ago

I just tested the mismatch theory and it seemed to start OK, with a job running session, but then it quickly switched to an error that this was not a session.

pjcross commented 7 years ago

Text for table definition:

The central input common to all Slycat™ models is a scalar data table. In this table, each column consists of values for a single input or output variable across all runs, and each row holds all of the variable values for a single simulation. Slycat™ accepts two file formats for table data, either Comma Separated Value (CSV) files, or for Dakota users, Dakota tabular files. If your data is not currently in one of these two formats, Excel can be used to create CSV files from most common table formats. Note that if output metrics have been created separately in a post-processing step, they will need to be integrated with the inputs into a single file ahead of time. In a CSV file, we expect to see only a single row of header information consisting of the column names.

pjcross commented 7 years ago

Text for working directory help (turns out I didn't have this prewritten, but here goes):

When creating a time series model, Slycat™ needs a location for temporary files to be created and stored. Given that these files may end up being quite large, Slycat™ relies on users to select a suitable scratch disk location for creation of this directory. The directory will not be automatically deleted. Its location will be remembered in the .slycatrc file in your home directory and will be the default location provided in future invocations of the time series wizard. If you decide to keep the HDF5 files created by the wizard, they can be read directly into the wizard as a third input format. This can be useful if you want to experiment with seeing the effects of varying the sampling or dendrogram parameter choices for a particular ensemble.

alexsielicki commented 7 years ago

Still need to find a way to default the path on "Select Timeseries File" to that selected on "Select Table File" when there is no preset in localStorage. But for now, this is proving to be very difficult.