Multi-component analysis file output requirements for plotting

madscatt commented 2 months ago

We need to alter the code to write JSON formatted fields for data that is to be plotted. Currently, for match point analysis there is data formatted like this:

cat ../results/users/joseph/no_project_specified/run_0/multi_component_analysis/match_point/sans_data.dat
input variables: {'app': 'multi_component_analysis', 'run_name': 'run_0', 'output_file_name': 'sans_data.dat', 'number_of_contrast_points': 4, 'stoichiometry_flag': False, 'match_point_flag': True, 'stuhrmann_parallel_axis_flag': False, 'decomposition_flag': False, 'fraction_d2o': [0.0, 0.2, 0.85, 1.0], 'izero': [0.85, 0.534, 0.013, 0.095], 'izero_error': [0.01, 0.044, 0.003, 0.002], 'concentration': [7.7, 7.7, 7.7, 7.7], 'concentration_error': [0.4, 0.4, 0.4, 0.4], 'initial_match_point_guess_flag': False}
--------------------------------
Final Results
number of points fit: 4
initial match point guess from polynomial fit: 0.7515
match point: 0.7544 +/- 0.0221
reduced chi-squared for linear fit: 1.0042
fraction_d2o  sqrt[I(0)/c]  sqrt[I(0)/c]_error  sqrt[I(0)/c]_calc  sqrt[I(0)/c]-sqrt[I(0)/c]_calc
   0.0000      0.3322      0.0088      0.3387         -0.0064
   0.2000      0.2633      0.0128      0.2489          0.0145
   0.8500     -0.0411      0.0049     -0.0429          0.0018
   1.0000     -0.1111      0.0031     -0.1103         -0.0008

It would be helpful to convert this to JSON.

Given a data set to plot:

data = """
fraction_d2o\tsqrt[I(0)/c]\tsqrt[I(0)/c]_error\tsqrt[I(0)/c]_calc\tsqrt[I(0)/c]-sqrt[I(0)/c]_calc
0.0000\t0.3322\t0.0088\t0.3387\t-0.0064
0.2000\t0.2633\t0.0128\t0.2489\t0.0145
0.8500\t-0.0411\t0.0049\t-0.0429\t0.0018
1.0000\t-0.1111\t0.0031\t-0.1103\t-0.0008
"""

Perhaps with code like this:

lines = data.strip().split('\n')
columns = lines[0].split('\t')
records = [{col: float(val) for col, val in zip(columns, line.split('\t'))} for line in lines[1:]]

json_data = json.dumps(records)`

So, @skrueger111 we should chat about how we should handle data statements in zazzie.

madscatt commented 2 months ago

One solution is to add a # at the beginning of all lines not required for plotting. Then, a final line preceding the lines to be plotted with a keyword, such as

# PLOT DATA X

where X is an integer definition of each plotting data set.

Each section can be bounded by

# END PLOT DATA X

Thus, a single parser can be written to handle plotting.

# input variables: {'app': 'multi_component_analysis', 'run_name': 'run_0', 'output_file_name': 'sans_data.dat', 
# 'number_of_contrast_points': 4, 'stoichiometry_flag': False, 'match_point_flag': True, # 'stuhrmann_parallel_axis_flag': False, 'decomposition_flag': False, 'fraction_d2o': [0.0, 0.2, 0.85, 1.0], 'izero': # [0.85, 0.534, 0.013, 0.095], 'izero_error': [0.01, 0.044, 0.003, 0.002], 'concentration': [7.7, 7.7, 7.7, 7.7], # # 'concentration_error': [0.4, 0.4, 0.4, 0.4], 'initial_match_point_guess_flag': False}
# --------------------------------
# Final Results
# number of points fit: 4
# Initial match point guess from polynomial fit: 0.7515
# match point: 0.7544 +/- 0.0221
# reduced chi-squared for linear fit: 1.0042
# PLOT DATA 1
fraction_d2o  sqrt[I(0)/c]  sqrt[I(0)/c]_error  sqrt[I(0)/c]_calc  sqrt[I(0)/c]-sqrt[I(0)/c]_calc
   0.0000      0.3322      0.0088      0.3387         -0.0064
   0.2000      0.2633      0.0128      0.2489          0.0145
   0.8500     -0.0411      0.0049     -0.0429          0.0018
   1.0000     -0.1111      0.0031     -0.1103         -0.0008
# END PLOT DATA 1

madscatt commented 2 months ago

Alternatively, you can leave your files alone, and I can add code to write the data required for the plots to a JSON formatted file using the original file name referencing the module and appending _plot.json. That way, we can have a single _plot.json file for each run.

import json

data = """
# PLOT DATA 1
fraction_d2o  sqrt[I(0)/c]  sqrt[I(0)/c]_error  sqrt[I(0)/c]_calc  sqrt[I(0)/c]-sqrt[I(0)/c]_calc
   0.0000      0.3322      0.0088      0.3387         -0.0064
   0.2000      0.2633      0.0128      0.2489          0.0145
   0.8500     -0.0411      0.0049     -0.0429          0.0018
   1.0000     -0.1111      0.0031     -0.1103         -0.0008
# END PLOT DATA 1
"""

lines = [line for line in data.strip().split('\n') if not line.startswith('#')]
columns = lines[0].split()
records = [{col: float(val) for col, val in zip(columns, line.split())} for line in lines[1:]]

json_data = json.dumps(records)

# Write the JSON data to a file
with open('test.json', 'w') as f:
    f.write(json_data)

Then, in the Genapp bin driver I can merely load the data with

# Read the JSON data from the file
with open('test.json', 'r') as f:
    json_object = json.loads(f.read())

skrueger111 commented 2 months ago

@madscatt: It would be easy to implement your first suggestion. But a single file containing all data to be plotted would be very large for modules that have a lot of data to plot. I’m thinking of decomposition in particular. If we can come up with a clean solution in that case, I think we’d be covered for the other modules. Right now, I’m writing multiple output files for decomposition. I wanted to write separate files so the user can download and make their own plots offline for papers, etc. Except for the general output file, most contain only data to be plotted except for a header.

I like your 2nd solution better. Then the file could always have an obvious name and there would only be one per run.

No matter what solution we decide on, I think we need to add "#" to lines that aren't required for plotting. I meant to do that and I may have already added a # at the beginning of header lines for output files written by decomposition. But I think it is still a “TODO” for some of the modules (including match point, so it seems). If you agree, I will check to see which modules need "#" in the output files and put in a ticket.

madscatt commented 1 month ago

Each sub-module has a method placed near the top of each file to save JSON data to disk.

For example, match_point.py, stuhrmann_parallel_axes.py, and stoichiometry.py have:

def save_data_to_plot_as_json(other_self, square_root_izero, square_root_izero_error, square_root_izero_calculated, diff):

def save_data_to_plot_as_json(other_self, delta_rho_inverse, rg_squared, rg_squared_error, rg_squared_calculated, diff):

def save_data_to_plot_as_json(other_self, izero, izero_error, izero_calc, diff):

With these files saved, the data can be subsequently read by the bin driver used in the web application. This comment covers the strategy used for plotting, so I will close this issue.

madscatt / zazzie

Multi-component analysis file output requirements for plotting #171