Data from the literature on the SCiO spectrometer

AndreySamokhin commented 1 year ago

I have glanced through all discussions about the raw data. I am surprised that nobody cited the US patent (https://patents.google.com/patent/US9377396B2) describing the main operating principle of the SCiO spectrometer. It is not new; I read it several years ago. According to the patent, the device works as follows:

diffuse light impinges upon an optical filter which has a relatively narrow bandpass (about 27nm);
a part of the light passes through the filter, then through a convex lens, and finally through a micro-lens array;
as a result, a series of circles appear on a CMOS matrix (each circle corresponds to a particular wavelength);

The SCiO spectrometer has 12 filters and 12 independent regions on the CMOS matrix. So, it is no surprise that 12 · 27 ≈ 331.

Knowing the main principle of operating one can assume that a file with the raw data is an image, not a spectrum. However, the patent mentioned above says nothing about the data structure. I decided to search deeper. I came across another US patent (https://patents.google.com/patent/US10330531B2). It confirms my guess. The bad news is that the raw data are both compressed and encrypted: "… the compressed encrypted raw data signal can be transmitted via Bluetooth to the handheld device. Compression of raw data may be necessary since raw intensity data will generally be too large to transmit via Bluetooth in real time. … The data generated by the optical system described herein typically contains symmetries that allow significant compression of the raw data into much more compact data structures". How is compression performed? Is the whole image zipped? Or only some parts? Can some data processing (e.g., averaging) also be involved?

The encrypted data are sent from the SCiO to a smartphone. Then the data are redirected to the server without any transformation: "The encrypted, compressed raw data signal from the spectrometer may be received by the UI of the handheld device … The UI may then transmit the data to the cloud server. … The cloud server can receive compressed, encrypted data and/or metadata from the handheld device. A processor or communication interface of the cloud server can then decrypt the data, and a digital signal processing unit of the cloud server can perform signal processing on the decrypted signal to transform the signal into spectral data". Experimental data obtained by @kebasaa (see the 01_rawdata/log_extracted/ folder) confirm this statement from the patent.

Everything that has been said above applies to the following data: sample, sample_dark, sample_white, and sample_white_dark. One can assume that sample_dark and sample_white_dark are recorded with turned-off LED and are used to estimate the dark current. It is clear (see the 1_rawdata/app_researcher_output/SCIO_scans_from_tech_support.csv file) that data stored in sample_white are needed to normalize the final spectrum. Unfortunately, I do not currently have any idea about sample_white_gradient and sample_gradient. I disagree with @earwickerh, who said that "sampleGradient … is the raw spectral data from the SCIO's internal white reference". White reference is stored in sample_white. According to log files (presented in the 01_rawdata/log_files/ folder), there is a special parameter called isDisableGradientSampling. I can assume that getting gradient samples is some kind of extra option and it is not obligatory. But it is only my guess.

To summarize, decompressing and decoding the raw data is not enough to get a spectrum. A mathematical model should be created to transform an image into a spectrum. Unfortunately, I cannot help with extracting an image from the raw data; it is beyond my qualifications. Is it ever possible? Nevertheless, I hope that someday I will participate in creating a mathematical model for this project.

kebasaa commented 1 year ago

Very interesting. I didn't know about these patents. My guess is that the encryption uses a transferred key, like the device ID (in my case "8032AB45611198F1") or something like that. I have no idea however what kind of encryption and compression this Blackfin BF512 processor is capable of. That might provide some information on how to get the resulting image.

I did see some allusions to JPEG at quality 100 in the code (and the blackfin processor is able to do that), but when I analysed the bytes it didn't fit the JPEG magical bytes. The other question of course is what compression produces images that always have the same number of bytes. Wouldn't compressed data be variable in length?

kebasaa commented 1 year ago

Further, the app code contains a lot of references to an i2s tag, or image2spec (in my case: "20150812-e:PRODUCTION"). Why such a tag would be present and transferred to the cloud server is also a bit mysterious, but could answer some questions.

kebasaa commented 1 year ago

@AndreySamokhin Upon re-reading what you found, I'm not entirely sure it's compressed using standard compression methods. The text says: The data generated by the optical system described herein typically contains symmetries that allow significant compression of the raw data... My guess here is that because the resulting image is one of symmetric circles, on 12 areas, there is no need to transmit the entire image. Maybe they just transmit a cross-section, and that's what they mean by "compression"?

kebasaa commented 1 year ago

I'm guessing the encryption is not AES, which requires blocks of 16 bytes in ECB mode. But we have 1800 and 1656, which is odd, with some padding (4 bytes in the beginning of the 1800). However, we would require something like 1792 or 1648 bytes for it to align to a 16 byte block. So I must be missing something here...

hbsagen commented 1 year ago

Is there any part of the samples that should remain constant? After a calibration, should in example the dark be equal for any subsequent scans always be the same?

AndreySamokhin commented 1 year ago

@kebasaa I believe, that we have too little data to make any assumption about the compression method. However, I found some additional data in the literature. Taking into account these new findings I can assume that an image pattern (i.e., circles) obtained for a particular region of a CMOS matrix is not pre-procced by a microcontroller. The reason is that the obtained circles can be asymmetrical because of the tilt of an image sensor.

The US10066990B2 patent (https://patents.google.com/patent/US10066990B2) describes an approach that allows compensating the tilt of an image sensor. If the sensor is tilted, the intensity of light in different regions of a CMOS matrix is not identical. The main idea of the invention is to compensate for this distortion mathematically: "Step 1015 can comprise, for example, determining the pattern and/or gradient of the variation of light intensity across the area of the spatially variable filter. In step 1020, the detector measurements are adjusted to reduce the spatial variation of light intensity determined in step 1015". I assume that sample_gradient and sample_white_gradient are used for this purpose. It is not currently clear to me which parts of the whole image are used to estimate the pattern of the variation of light intensity because most methods described in the patent cannot obviously be implemented in the case of the SCiO spectrometer (see the following photo https://cdn.sparkfun.com/assets/learn_tutorials/6/5/7/SCIO_Teardown_Images-15.jpg).

The US10203246B2 patent (https://patents.google.com/patent/US10203246B2) describes the calibration process. Several conclusions can be made. The statements from the first post about sample, sample_dark, sample_white, and sample_white_dark seem to be correct: "In many instances as described herein, calibration measurements are obtained with the 'white reference' (herein-after 'WR') material with light or dark signals, and combinations thereof. In some cases, the measurement may comprise a 'WR - dark' measurement when the illuminator is turned off. ... If a 'WR - dark' measurement was taken the 'WR - dark' measurement average can be subtracted from the average WR signal measurement". Each SCiO device is likely calibrated at a production site using a 'golden' calibration reference. The respective spectrum is stored in the cloud and used during data processing. "In step 3820, a cover of a handheld spectrometer may be calibrated at a production site. For example, a reference material provided with the cover may be measured with a reference spectrometer to generate the cover spectrum. A 'golden' calibration reference may be measured with the same reference spectrometer just before or after generating the cover spectrum, wherein the 'golden' calibration reference is located at the production site. The cover spectrum may then be divided by the 'golden' reference spectrum to generate the cover calibration spectra".

AndreySamokhin commented 1 year ago

Sorry! I pushed incorrect button.

kebasaa commented 1 year ago

@AndreySamokhin You are finding so much valuable information! Thank you! So based on the golden calibration spectrum, we have to assume that each SCiO device would return a spectrum of 1 for its own calibration box, correct? And I assume further, that we may not even have an image in the data, or do we? How do you suggest to proceed?

@hbsagen I don't think there is such a part of the scan. Basically, the sensor adjusts its gain and may record noise during low light, but it will record some ambient light in most conditions. However, the calibration box measurements should be near identical, and so should my "dark" measurements be that I supplied in 01_rawdata/scan_json/, as I completely covered the lamp and sensor using completely intransparent aluminium foil during the scans. I hope no light leaked through.

AndreySamokhin commented 1 year ago

we have to assume that each SCiO device would return a spectrum of 1 for its own calibration box, correct?

@kebasaa I think not. The spectral properties of a material located inside a cover of the SCiO spectrometer can vary from batch to batch. The main purpose of the 'golden reference' is to level out this variation and to make measurements performed with different devices comparable (see the equation presented in column 46 in US10203246B2).

we may not even have an image in the data, or do we?.

@kebasaa We know too little.

AndreySamokhin commented 1 year ago

@kebasaa A small addition to the above. I was talking about 'white' spectra returned by the device, not 'white' spectra returned by the server.

hbsagen commented 1 year ago

Can the magic headers be left out of the data? So adding various compression headers may work?

The same for image headers?

kebasaa commented 1 year ago

Consumer Physics confirmed to me that the consumer product has been discontinued. Basically, they will only sell the Researcher license, which they sell for a lot of money. That's a real pity for us Kickstarter users. I wish we could decode the device, so let me know if you find out more

kebasaa commented 1 year ago

It appears that they have more patents here that could be helpful:

kebasaa commented 1 year ago

The CMOS sensor is from ON, and this is a sensor similar to the one in the SCiO. Maybe it even is exactly the correct one: https://www.onsemi.com/pdf/datasheet/mt9m034-d.pdf

Now to represent 3 rows of 1280 pixels or 4 columns of 960 pixels (which would be enough for to fully represent 12 circles due to the symmetry), 3840 values are the result. The data is 12-bit though, resulting in 5760 bytes. But we know that the measurements are 1800 or 1656 bytes, compressed and encrypted. How this compression is achieved is still unknown

The CMOS can do 3 exposures for HDR though, resulting in linearized 20−bit value for each pixel’s response. This 20−bit value is then optionally compressed back to a 12− or 14−bit value for output. For 14−bit mode, the compressing is lossless. In 12−bit mode, there is minimal data loss. It is of course possible that the SCiO does exactly that

kebasaa / SCIO-read

Data from the literature on the SCiO spectrometer #6