Closed rjs3273 closed 2 years ago
What I was trying to explain in the original post was probably clear enough, but I found another way of showing the same thing. This may clarify (or confuse) further.
After running the fit, if I look at the returned arrays, I can see that there are more values in residual than there are in matched_atlas. I.e., certain lines in the atlas list (matched_atlas) have been used twice.
pprint(spec.residual)
pprint(np.shape(spec.residual))
array([-0.46671741, 0.8945916 , -1.64546253, 1.99579596, -0.14289408,
-0.62377778, -0.19898066, 0.46534403, -1.44493276, 0.3770039 ,
1.90681117, -0.54187621, 0.82314817, 0.79452983, -0.02143074,
-3.617493 , 0.33016304, 0.95724878, -0.37969412, -0.16223187,
1.13499737, -0.74640436, 1.18832576, -0.19932584, -0.67673827])
(25,)
pprint(spec.matched_atlas)
pprint(np.shape(spec.matched_atlas))
array([4501. , 4734.152, 4807.02 , 4919.831, 5823.89 , 5893.29 ,
5934.17 , 6182.42 , 6318.06 , 6469.7 , 6668.92 , 6728.01 ,
6827.32 , 6882.16 , 6925.53 , 6976.18 , 7119.6 , 7285.3 ,
7584.68 , 7642.02 , 7802.65 , 7887.4 ])
(22,)
Hi @rjs3273 - taking a look at this ASAP. Do you have the spectrum/test case for this? I can't obviously repeat it with the example files we have in rascal, but I can see where the issue would arise (just not sure at which step it is). In theory we have a bijective match function which should prevent this from happening, but clearly it's not working!
Hi Josh. I was actually working on exactly that yesterday, trying to make an example data set as a test case. Unfortunately I have just broken my deployment so I need to fix that first and will get back to you.
@jveitchmichaelis Is this fixed by #48?
@rjs3273 ping
@rjs3273 @jveitchmichaelis I don't see that happening anymore. Assume it's fixed. Close for now.
I found a behaviour in the arc fitting that I had not expected. It is a little counter intuitive but may be a natural consequence of the way that rascal works. I should stress that I was using RASCAL via ASPIRED (https://github.com/cylammarco/ASPIRED), so it is possible I should file this query on that repository instead? Let me know if you want me to move it. Looking at the ASPIRED code, the wrapper to the Calibrator.fit() seems fairly thin so I am just guessing the issue more likely relates to here.
In ASPIRED I use initialise_calibrator() set_hough_properties() do_hough_transform() fit()
The attached plots are output from aspired.OneDSpec.fit() which I think is calling rascal.Calibrator.fit().
A single line in the arc line list can get used twice in the final fit for two different peaks in the arc spectrum. It appears that if two peaks fall within the specified tolerances it uses both. Naively I would normally expect a fit to match the arc line to the peak that fits best and ignore the other so that any one line from the line list only ever gets used once. What it does in practice is create two points in the final output fit. One has a residual very close to zero (the correct match) and the other has a large residual (an erroneously matched, nearby arc feature).
In the attached examples:
I have not done a lot yet to try and fine tune the input parameters to prevent this. I am primarily asking if this is expected behaviour and if there is an option to prevent it. I could just drop the problematic line from my line lists but that loses a point from the fit and since one of the two matches does appear to be correct, it would be good to keep them.