About the "process(C, proc)" function in "read_file"

chrlein commented 6 months ago

Hi Anton,

I again want to express my sincere gratitude for sharing your code. I have now spent a significant amount of time delving into it and have been able to understand most of its functionality. I have also successfully implemented some very useful tools, such as a peak-finding algorithm for the heatmap, allowing me to create a scatter plot of emission rates over temperature (Arrhenius plot). Additionally, the code now performs linear regression on the Arrhenius plot to identify defect characteristics, specifically activation energy and capture cross-section.

However, I am still struggling to fully grasp the purpose and functionality of the "process(C, proc)" function. I would be incredibly grateful if you could explain the rationale behind its implementation. Using your original code with my data consistently results in low-frequency noise. Could you perhaps provide some insight into this issue?

This noise vanishes when not performing the "F = F + np.average(F)*2":

Therefore, the "Contin" algorithm appears to perform less effectively in the low-temperature regime. This is likely due to the increased noise in the initial transient data. Unfortunately, the "reSpect" algorithm is incompatible with this specific modification. If you have any experience with the suitability of different transient processing techniques for various data types, I would greatly appreciate your insights.

Thank you very much & best regards, Chris

nocliper commented 6 months ago

Hi again!

proc() function

If proc is True, transient data should be normalized and rotated in a way such that data points decrease with increasing time (you know that sign of the DLTS peak is attributed to of the type of carriers involved, so the transient can be either increasing or decreasing). This is necessary for the proper functioning of the Contin and reSpect algorithms, which only work with positive and decreasing functions.

Regarding the issue of noise in the low emission rates region, this problem arises from the inexact determination of the transient baseline. Reducing these false peaks can be challenging because:

1) Finding the baseline with get_Baseline() is not very precise and works best for single exponent transients only (https://doi.org/10.1063/1.1149581) so the baseline value obtained from get_Baseline() is usually overestimated, which makes the function look like this (red line):

which sometimes shifts data point to negative values and results bad fitting results. But shifting transient up with F + av(F)*2 results resonably good fit but false peak at low emission rates appears and this is the reason I left F + av(F)*2.

2) Finding the baseline with additional regression slows down computations. 3) If you are keeping the temperature sweep rate constant, then the value of C(t->inf) drifts, which also presents challenges with fitting.

Overall, there is no one correct way to use this part of the code. I have left it as it works with my data, but you may need to find your own approach.

P.S. I'm not sure if you are getting a good fit because you set a linear time scale for your transient plot, making it difficult to monitor the quality of the fit at high emission rates. Switch it to a log scale since your data is equally spaced on a log scale. P.P.S. From what I can see, your data is non-monotonic, so I suggest trying the L1+L2 or FISTA algorithm. If this is due to a badly-cut transient, it would be better to preprocess it before computations.

nocliper commented 5 months ago

@chrlein By the way of implementation of the get_Baseline() function, since you have log-spaced time points, this function wouldn't work without properly cutting the transient (cutting all data before exponential decay) and without correct determination of the transient value at the middle of the decay.

Let's say you have a 101-point (l=101) transient. The get_Baseline() function finds c1, c2, and c3 as f[0], f[l//2-1], and f[l-1]. For linear-spaced time, you have proper values for all three cs. However, for log-spaced time, 'c2' computed inside get_Baseline() is incorrect because you need `c2 = f[at t = half of total decay time] (see this for details: https://doi.org/10.1063/1.1149581)

chrlein commented 5 months ago

Hi Anton!

Thanks for you input. The visulazation with the transient shifts was very helpful, and now I implemented all the parameters from process() into the UI to find the best fits for different shaped transients. You are totally right about the get_Baseline() function. For this reason I completely removed this for now with log-spaced time values and it still works just fine.

Another thing I was wondering about is the L-Curve: When computing it, is the result for λ in general the best choice to use for the ILT? Also, is this value fairly constant over temperature according to your experience? And sometimes (wihtin the same measurement data) I get the RuntimeError: too many iterations , due to which I have to decrease the number N (which I don't really want to do). Do you have any other approach to prevent this?

Many thanks! Best, Chris

nocliper commented 5 months ago

Well, there is a good explanation on solving such problems and determining optimal solutions here.

In my experience, the l-curve criterion prevents me from overfitting (getting false peaks). Sometimes, it's better to compute the l-curve at a specific temperature (which is interesting for you). Then, you can try to compute the same l-curve with nearby temperature transients, pick the "mean" λ value, and keep it constant for the entire temperature range.

To prevent runtime errors, you can also pick a λ value an order of magnitude higher. This will shift the bounds of l-curve computations, which gives you runtime errors when λ is too low.

nocliper / ilt

About the "process(C, proc)" function in "read_file" #3

proc() function