Phase fractions of each phase

creuzige commented 2 years ago

I was able to revise the code to:

use the histogram data to set the range for theoretical intensities
use the theoretical intensities to mark locations in the histogram data to fit

This works pretty well, but the changes to how cif files get imported has resulted in a new issue. The calculate_phase_fraction function tacitly assumed we'd return a DataFrame with the results from just the austenite phase. It's probably a good change to return the results for all phases, but I wasn't sure how best to do this.

Currently I put a DataFrame for each phase in a dictionary, but didn't return it until we sorted this out. The code uses the last DataFrame created, so it runs but the output is messed up.

I'd be interested to hear your thoughts. I'd like to have tables for each phase, but I'm not sure how dash would handle a dictionary of 'n' DataFrames. It seems like it prefers a defined number.

maxgarman commented 2 years ago

This could work, because technically dicts don't need a defined size, but it may be something where we have to find the size of the dict before we use it, which isn't very difficult. It could just come down to writing a semi-complex loop to be able to get all of the data out in a reasonable way.

dnewton600 commented 2 years ago

If I understand correctly, I think the easiest option is to combine everything into one DataFrame, with 'Phase' as one of the columns (and we could even add blank rows for formatting).

If we do want to have multiple tables (where the number depends on the number of phases), we can do that too, either managing the layout dynamically, or 'cheating' by having say 5 blank tables, filling in the first 'n' as needed, and leaving the rest empty.

If there are any preferences I'll go ahead and get this implemented.

creuzige commented 2 years ago

I'm a little concerned about putting 'everything' in one dataframe, but I do like consolidating things as much as we can. Currently we have dataframes for:

Fit intensity and theoretical intensity. I kind of like keeping this one a bit separate, since it makes it easier to merge, and each row maps to an hkl plane. I'm not sure if we do repeated fits if we keep them all or just the final one.
Phase fraction. This is where I got a bit lost, since my initial mindset was to just return information about the austenite phase. But I think it makes sense to report on each phase row-wise, and maybe each column becomes a part of that phase data.
Uncertainty values, flags, feedback. I'd say this is the least thought out data structure. I wanted a place to start adding comments, flags, and suggestions, but I hadn't thought too much about how to integrate this data. Initially I thought each row would express an uncertainty value, but I don't see how to organize that cleanly by phase. Having separate dataframes for each phase in a dictionary is certainly one approach. But I feel like it's a little clunky and there will be lots of repetitive annotations.

So I don't know which of these would be good to combine, or how best to do it and wanted to discuss.

dnewton600 commented 2 years ago

Agreed, sorry I wasn't clear lol, I meant 'everything' as in combining the phase fraction data into a single dataframe, instead of having a dictionary of n dataframes. If this sounds OK, I'll go ahead and implement this -- we can always separate the DF into multiple tables for display, but might be nice to have all of the phase fraction data in a single dataframe (where we have some column variable for 'phase').

So I agree to your first two points, and that we can keep discussing bullet 3 as we dig deeper into the peak fitting procedure.

creuzige commented 2 years ago

I'm having a hard time imagining what the data structure might would look like, maybe a quick mock up would help? I'm just trying to avoid asking you all to expend a bunch of time developing something that may not match what's in my brain but that I'm unable to articulate...

creuzige commented 2 years ago

Chatting with David, I think we've reached a consensus to sort things in the following dataframes:

Row-wise HKL dataframe (currently holds Fit and normailzed intensities). Add columns for uncertainties of various terms. For example, the fit intensity data column would have additional columns for the fit uncertainty and the uncertainty from counting statistics. The normalized intensities would end up with uncertainty values as well. Different portions of this dataframe would be shown in the dash app as separate tables.
Phase fraction dataframe. Row-wise for each phase, columns for values, combined uncertainties, and separate columns for each source of uncertainty (likely from turning these off and on in the calculation)
Flags and Feedback dataframe. Passed all along the process and getting appended to as needed. Example would be if the peak fit fails for a particular peak: Flag - Errors in austenite (200) peak, two_theta mismatch. Feedback - Check if Austenite (200) peak at ## degrees is present in the data.

In addition to the absolute uncertainty value, I've been considering adding a relative uncertainty as well. To me, relative uncertainties can help flag problematic areas. But these kind of tacitly assume that each term is equally weighted in the phase fraction fit.

usnistgov / AusteniteCalculator

Phase fractions of each phase #13