sandialabs / slycat

Web-based data science analysis and visualization platform.
http://slycat.readthedocs.org
Other
75 stars 19 forks source link

Add New Rows and Columns to Existing Tables #781

Open pjcross opened 7 years ago

pjcross commented 7 years ago

Two customers have asked for the ability to update tables to either add columns (say to add an additional image column) or to add rows (to add additional runs or samples that are being generated over a period of time). The issue is the overhead of re-ingesting the data from scratch each time. In particular, selecting the designations (inputs/outputs/neither/categorical/etc) for each variable each time and then regenerating the models for each iteration. I showed one of them how to use templates, which does speed up the model recreation, but doesn't handle the designations or prevent having to recreate each model separately. I think this is mostly for Parameter Space, especially with the extra columns, but I can imagine that regenerating CCA models would be desired for the case of extra rows.

pjcross commented 7 years ago

After talking to Matt a bit, I think that this should be handled in a manner similar to how we create the "Modified CCA Model". Although we actually generate a new CCA model, it requires little additional effort. We can add a "Modified Parameter Space Model" selection to the create button, enabling it only from within an existing Parameter Space model. It will trigger a revised Parameter Space ingestion wizard, which prompts the user to read in a new CSV table (remotely or locally), then uses the meta-data from the existing model to assign inputs/outputs/neither/categorical/editable to the new table. If additional columns have been added, these would default to 'neither', with 'categorical' and 'editable' set to false. These default assignments would be presented in the wizard, though the user could make changes before advancing to the next frame. The model name would be defaulted to the previous name plus some mention that it is revised.

After the new table is ingested, the template (bookmark values) from the Parameter Space model that is being "modified" would be used to generate a state in the new model that approximates the original (so axes, filters, color-coding, selections would be replicated). The assumption is that the new table is mostly the same as the old table, with just some additional columns or rows added.

Although some columns and rows might be able to be removed without difficulty, that would be very data dependent, so perhaps we want to return an error if the table has lost rows or if the column headers no longer include the original set (still unsure how to handle this). Or we could allow them to complete the ingestion using the matching metadata, but then not apply a template to the model (since that's where I expect that removing data will create a problem - selected variables or points might be missing).