Using the same line from a line list multiple times

rjs3273 commented 3 years ago

I found a behaviour in the arc fitting that I had not expected. It is a little counter intuitive but may be a natural consequence of the way that rascal works. I should stress that I was using RASCAL via ASPIRED (https://github.com/cylammarco/ASPIRED), so it is possible I should file this query on that repository instead? Let me know if you want me to move it. Looking at the ASPIRED code, the wrapper to the Calibrator.fit() seems fairly thin so I am just guessing the issue more likely relates to here.

In ASPIRED I use initialise_calibrator() set_hough_properties() do_hough_transform() fit()

The attached plots are output from aspired.OneDSpec.fit() which I think is calling rascal.Calibrator.fit().

A single line in the arc line list can get used twice in the final fit for two different peaks in the arc spectrum. It appears that if two peaks fall within the specified tolerances it uses both. Naively I would normally expect a fit to match the arc line to the peak that fits best and ignore the other so that any one line from the line list only ever gets used once. What it does in practice is create two points in the final output fit. One has a residual very close to zero (the correct match) and the other has a large residual (an erroneously matched, nearby arc feature).

In the attached examples:

There are arc peaks detected at 4520 and 4495. Both have been matched to the 4521A feature in my supplied line list.
There are arc peaks detected at 7280 and 7260. Both have been matched to the 7284A feature in my supplied line list.

I have not done a lot yet to try and fine tune the input parameters to prevent this. I am primarily asking if this is expected behaviour and if there is an option to prevent it. I could just drop the problematic line from my line lists but that loses a point from the fit and since one of the two matches does appear to be correct, it would be good to keep them.

rjs3273 commented 3 years ago

What I was trying to explain in the original post was probably clear enough, but I found another way of showing the same thing. This may clarify (or confuse) further.

After running the fit, if I look at the returned arrays, I can see that there are more values in residual than there are in matched_atlas. I.e., certain lines in the atlas list (matched_atlas) have been used twice.

pprint(spec.residual)
pprint(np.shape(spec.residual))

array([-0.46671741,  0.8945916 , -1.64546253,  1.99579596, -0.14289408,
       -0.62377778, -0.19898066,  0.46534403, -1.44493276,  0.3770039 ,
        1.90681117, -0.54187621,  0.82314817,  0.79452983, -0.02143074,
       -3.617493  ,  0.33016304,  0.95724878, -0.37969412, -0.16223187,
        1.13499737, -0.74640436,  1.18832576, -0.19932584, -0.67673827])
(25,)

pprint(spec.matched_atlas)
pprint(np.shape(spec.matched_atlas))

array([4501.   , 4734.152, 4807.02 , 4919.831, 5823.89 , 5893.29 ,
       5934.17 , 6182.42 , 6318.06 , 6469.7  , 6668.92 , 6728.01 ,
       6827.32 , 6882.16 , 6925.53 , 6976.18 , 7119.6  , 7285.3  ,
       7584.68 , 7642.02 , 7802.65 , 7887.4  ])
(22,)

jveitchmichaelis commented 3 years ago

Hi @rjs3273 - taking a look at this ASAP. Do you have the spectrum/test case for this? I can't obviously repeat it with the example files we have in rascal, but I can see where the issue would arise (just not sure at which step it is). In theory we have a bijective match function which should prevent this from happening, but clearly it's not working!

rjs3273 commented 3 years ago

Hi Josh. I was actually working on exactly that yesterday, trying to make an example data set as a test case. Unfortunately I have just broken my deployment so I need to fix that first and will get back to you.

cylammarco commented 2 years ago

@jveitchmichaelis Is this fixed by #48?

cylammarco commented 2 years ago

@rjs3273 ping

cylammarco commented 2 years ago

@rjs3273 @jveitchmichaelis I don't see that happening anymore. Assume it's fixed. Close for now.

jveitchmichaelis / rascal

Using the same line from a line list multiple times #47