publiclab / spectral-workbench

Web-based tools for collecting, analyzing, and sharing data from a DIY spectrometer
http://spectralworkbench.org
GNU General Public License v3.0
126 stars 158 forks source link

Display precision should communicate hardware limitations #344

Open jywarren opened 8 years ago

jywarren commented 8 years ago

Users should not be mistaken about the effective resolution of their instruments, but should be able to distinguish between the stored resolution which we use to avoid rounding error (and for ~archival purposes) from displayed resolution (which we should reduce to close to effective hardware-limited resolution) and there should be clear information about these different precisions.

This relates closely to #335 as well, which deals with stored precision, which isn't necessarily displayed to the user.

Via @Stoft1:

Whatever the source data, the errors are always controlled by the worst case. If you start with PLab 640pix data you get ~1nm/pix -- but if they used a wide slit, the FWHM resolution is likely a lot worse. Ok, so you can collect 1nm data but you still have the issue of how to tell the user his FWHM resolution is actually 10nm, not 1nm. Maybe with some analysis tools on his CFL plot, but not with just any spectrum.

Ok, so for data collection and reporting it's still based on the 1nm/pix resolution. However, SWB still displays with XXX.XXX precision which is obviously misleading. So, you can't do that. The best you can display is just 1nm.

Ok, what about spectra A starts with 1nm as the reference and then calcs against a data file with 0.1nm data. You can't report (file or display) with 0.1nm because the reference is only 1nm. The lowest resolution "wins" -- the least common denominator.

What about the reverse --the reference spectra is from a file with 0.1nm data -- then calc using 1nm data. This case is harder because it depends on 1) what you can assume about the 1nm data and 2) how you interpret the results. If you treat the 1nm data as simply LPFd (low pass filtered) data then you can keep 0.1nm results because it is like removing some baseline average. But if the 1nm data is just low-rez and you expect 0.1nm rez results, you'd better know to add the 1nm error to your 0.1nm data error for those results. How will you make that clear to the average user?

I think what this says, is that the resolution of the results of calculating between different data resolutions isn't compatible with a single rule where the user can assume the resulting resolution accurately and absolutely represents the errors from combining the two sources -- without any thinking about it from the user.

This points to deciding the purpose of the SWB tool -- I'm assuming it is not expected to be just a general-purpose spectral processing tool, but was PLab-centric and reflects the PLab hardware ..... but it can input and process alternate data. This leads back to the basic limitation -- the lowest resolution is the determining factor.

A possible "dynamic" criteria would be that any spectrum to be processed by SWB MUST have a CFL cal plot associated with it and every CFL plot has been processed to measure the FWHM resolution. Then, IF that FWHM resolution can be shown to be better than 1nm, then it is ok to save calc results with 0.1nm rez .... else, all calc results are saved with 1nm integer resolution.

So, far, I've never seen a single PLab plot (not even mine) that warranted anything better than 1nm rez. This is also the simplest answer to the code handling because 1) all input pixel data gets converted to 1nm integer increments, 2) al CSV files have the same 1nm integer wavelengths, 3) all spectra comparisons are on 1nm integer increments, 4) the wavelength span for all data calc can be the same and even the display is made simpler --just 2 numbers to set the upper and lower limits (which could be less or equal to what the data file has). You won't lose the source data and a chain of calculations can never get worse rez than the original 1nm integer conversion for each of the source data streams. The error will be limited to the combination of the initial hardware errors of each device providing a spectra plus the pixel-to-nm conversion error for each of the spectra being combined. After that, there is no additional error for the combining process when all spectra have 1nm integer rez data. (This is not to say there no other errors, but those other errors are outside the spectra-combining process itself).

So, if you are still worried .... to "hedge your bets", you could simply keep the 1nm integer data as ABC.DE rez data storage where the 'E' digit is simple always '0' for now ..... and then, if by some wonderful advancements the cameras and the source data improves, you could then decide later to use the 'E' digit -- which would give you 10x resolution improvement headroom but you'd then have to report in uniform increments --- maybe just ABC.00, ABC.05, ABC.10 at first .... and then ABC.00, ABC.02, ABC.04 etc. .... Just a thought --- but I'd bet that day is a long way off .......

jywarren commented 8 years ago

I'd like to address display precision using d3 labels, so that we have a clear separation between display and storage. Probably related to tickFormat: https://github.com/mbostock/d3/wiki/Quantitative-Scales#linear_tickFormat

jywarren commented 8 years ago

Alternatively, if we can use links for the graph unit markers "nanometers" and "intensity", we could just link to further clarification -- if that's easier than working in d3.js