jveitchmichaelis / rascal

RAnsac Assisted Spectral CALibration
BSD 3-Clause "New" or "Revised" License
12 stars 2 forks source link

Using the same line from a line list multiple times #47

Closed rjs3273 closed 2 years ago

rjs3273 commented 3 years ago

I found a behaviour in the arc fitting that I had not expected. It is a little counter intuitive but may be a natural consequence of the way that rascal works. I should stress that I was using RASCAL via ASPIRED (https://github.com/cylammarco/ASPIRED), so it is possible I should file this query on that repository instead? Let me know if you want me to move it. Looking at the ASPIRED code, the wrapper to the Calibrator.fit() seems fairly thin so I am just guessing the issue more likely relates to here.

In ASPIRED I use initialise_calibrator() set_hough_properties() do_hough_transform() fit()

The attached plots are output from aspired.OneDSpec.fit() which I think is calling rascal.Calibrator.fit().

A single line in the arc line list can get used twice in the final fit for two different peaks in the arc spectrum. It appears that if two peaks fall within the specified tolerances it uses both. Naively I would normally expect a fit to match the arc line to the peak that fits best and ignore the other so that any one line from the line list only ever gets used once. What it does in practice is create two points in the final output fit. One has a residual very close to zero (the correct match) and the other has a large residual (an erroneously matched, nearby arc feature).

In the attached examples:

I have not done a lot yet to try and fine tune the input parameters to prevent this. I am primarily asking if this is expected behaviour and if there is an option to prevent it. I could just drop the problematic line from my line lists but that loses a point from the fit and since one of the two matches does appear to be correct, it would be good to keep them.

Screenshot 2021-08-17 at 3 37 44 PM Screenshot 2021-08-17 at 3 36 09 PM
rjs3273 commented 3 years ago

What I was trying to explain in the original post was probably clear enough, but I found another way of showing the same thing. This may clarify (or confuse) further.

After running the fit, if I look at the returned arrays, I can see that there are more values in residual than there are in matched_atlas. I.e., certain lines in the atlas list (matched_atlas) have been used twice.

pprint(spec.residual)
pprint(np.shape(spec.residual))

array([-0.46671741,  0.8945916 , -1.64546253,  1.99579596, -0.14289408,
       -0.62377778, -0.19898066,  0.46534403, -1.44493276,  0.3770039 ,
        1.90681117, -0.54187621,  0.82314817,  0.79452983, -0.02143074,
       -3.617493  ,  0.33016304,  0.95724878, -0.37969412, -0.16223187,
        1.13499737, -0.74640436,  1.18832576, -0.19932584, -0.67673827])
(25,)
pprint(spec.matched_atlas)
pprint(np.shape(spec.matched_atlas))

array([4501.   , 4734.152, 4807.02 , 4919.831, 5823.89 , 5893.29 ,
       5934.17 , 6182.42 , 6318.06 , 6469.7  , 6668.92 , 6728.01 ,
       6827.32 , 6882.16 , 6925.53 , 6976.18 , 7119.6  , 7285.3  ,
       7584.68 , 7642.02 , 7802.65 , 7887.4  ])
(22,)
jveitchmichaelis commented 3 years ago

Hi @rjs3273 - taking a look at this ASAP. Do you have the spectrum/test case for this? I can't obviously repeat it with the example files we have in rascal, but I can see where the issue would arise (just not sure at which step it is). In theory we have a bijective match function which should prevent this from happening, but clearly it's not working!

rjs3273 commented 3 years ago

Hi Josh. I was actually working on exactly that yesterday, trying to make an example data set as a test case. Unfortunately I have just broken my deployment so I need to fix that first and will get back to you.

cylammarco commented 2 years ago

@jveitchmichaelis Is this fixed by #48?

cylammarco commented 2 years ago

@rjs3273 ping

cylammarco commented 2 years ago

@rjs3273 @jveitchmichaelis I don't see that happening anymore. Assume it's fixed. Close for now.