kebasaa / SCIO-read

Read data from the SCIO spectrometer
GNU General Public License v3.0
31 stars 4 forks source link

sample, sampleDark and sampleGradient #3

Open earwickerh opened 1 year ago

earwickerh commented 1 year ago

Nice to see you're still tinkering with this. In the readme it says: "Every SCIO bluetooth LE message contains 3 parts: sample, sampleDark and sampleGradient (No clue so far what that those mean or how to convert them)." Not sure if that's up to date, but hope the below is helpful.

sample: This is the raw spectral data from the sample. It represents the light that is reflected off the sample and detected by the SCIO.

sampleDark: This is the raw spectral data from the SCIO's internal dark current reference. It represents the background signal that is detected when there is no light present.

sampleGradient: This is the raw spectral data from the SCIO's internal white reference. It represents the signal detected when the SCIO is measuring a known white reference

To calculate the reflectance values of the sample, you need to subtract the sampleDark data from the sample data and divide the result by the difference between the sampleGradient data and the sampleDark data. This is expressed by the equation R = (S - D) / (G - D), where R is the reflectance value, S is the sample data, D is the sampleDark data, and G is the sampleGradient data.

Let's take -bark.txt sample:

  Header: 01 ba 02 90 01 8f 1c 07 34 02 00 02 8e
  Sample: 06 36 7b 2e 4f 3d 1c 0e 06 04 04 06 12 ...
  SampleDark: 05 34 75 26 48 36 1c 0c 05 04 05 05 0d ...
  SampleGradient: 0b 4f 9c 3e 6c 4c 28 11 0a 09 0a 0c ...

Packet 2:
  Header: 02 ba 02 90 01 8f 1c 07 34 02 00 02 8f
  Sample: 06 37 7c 2f 4f 3e 1c 0d 07 04 04 05 12 ...
  SampleDark: 05 34 75 26 48 36 1c 0c 05 04 05 05 0d ...
  SampleGradient: 0b 4f 9c 3e 6c 4c 28 11 0a 09 0a 0c ...

Each packet consists of a header and three data sections: sample, sampleDark, and sampleGradient. The header contains information about the packet, including a packet identifier (01 or 02), the protocol identifier (ba), and the length of the data sections (in this case, 02 90).The sample, sampleDark, and sampleGradient data sections are each 400 bytes long and contain spectral data measured by the SCIO spectrometer.

To calculate reflectance values from the sample, sampleDark, and sampleGradient data, you need to perform the following steps:

Subtract the sampleDark data from the sample data to obtain the corrected sample signal.
Subtract the sampleDark data from the sampleGradient data to obtain the corrected gradient signal.
Divide the corrected sample signal by the corrected gradient signal to obtain the reflectance values.

The example code below should extract the data from the log, converts these arrays to numpy arrays of integers and performs the reflectance calculation using numpy array operations.

import numpy as np

# Load the raw data from the log file
with open("log_20200604-bark.txt", "r") as f:
    data = f.readlines()

# Parse the data into packets
packets = []
for line in data:
    if line.startswith("Packet"):
        packets.append(line.strip().split(": ")[1])

# Extract the sample, sampleDark, and sampleGradient data from the packets
header = packets[0].split(" ")[2:]
sample = packets[1].split(" ")[1:]
sampleDark = packets[2].split(" ")[1:]
sampleGradient = packets[3].split(" ")[1:]

# Convert the data from hex strings to numpy arrays of integers
sample = np.array([int(x, 16) for x in sample])
sampleDark = np.array([int(x, 16) for x in sampleDark])
sampleGradient = np.array([int(x, 16) for x in sampleGradient])

# Perform the reflectance calculation
correctedSample = sample - sampleDark
correctedGradient = sampleGradient - sampleDark
reflectance = correctedSample / correctedGradient

# Print the first 10 reflectance values
print(reflectance[:10])

The reflectance values will be in units of "counts per second"...

kebasaa commented 1 year ago

Wow, thanks a lot for this. It is still highly relevant

I've been trying to apply it, using Python's struct library (I'm using the little-endian encoding, i.e. "<I". But I ended up with a few questions, and I'm wondering if you can help me:

  1. An integer is 4 bytes, i.e. when I convert sample and sampleDark (each is 1800 bytes total) as you suggested, I end up with many more values than 400, even after removing the header (which is 8 bytes according to your instructions). Basically, I would have an 8 byte header, followed by 400 integers, and then 196 bytes that are unknown. For sampleGradient, I end up with too many values as well (1656 bytes total, i.e. 8 byte header, 400 integers and 52 bytes left over). But if instead all the values after the header were integers, then I get 449 integers for sample and sampleDark, and 413 for sampleGradient. That makes it difficult to do the reflectance math. Is it simply that the remaining data is not used? Do you know what's going on?
  2. I assume that the best way to create a reflectance spectrum from the units of "counts per second" is to normalise it to 0-1? But there must be a fixed maximum and minimum for that? The values that I'm currently getting are huge
  3. Do you have any idea how to apply the calibration? I guess it's based simply on a scan inside the calibration box, but what is the math behind it?
  4. Here is some example data:
    1. sample first few hex bytes: ba0208070000000018c9a124fc3f2e57aa716475154067b118384dd9407d30dda544906a9753...
    2. sample, resulting in: -70 (protocol) 2 (signal type: scan) 1800 (length in bytes), then the following integers 614582552 1462648828 1969516970
    3. sampleDark first few integers: 4269192318 1192275574 2032346525 from ba020807000000007eb476fe76ae10479d252379e65579bccb56d70f85bfecfd6729eb3ebd...
    4. sampleGradient first few integers: 405697630 247858264 128659535 from ba0278066e0000005e742e185804c60e4f30ab0756c7b26e52e1672bba13a1898f76f0c0b647922...
    5. Resulting reflectance: 9.45933685e-01 -2.86285788e-01 3.30041416e-02
kebasaa commented 1 year ago

I am still trying to work with the code you provided, but I'm not managing to find the bytes you mention. Can you elaborate a bit on how you obtained these and decoded the data? Thanks

hbsagen commented 1 year ago

Here is my scan.json

{ "device": { "deviceBleId": "d40c760000f37b98", "bleFWVersion": 125, "serial_number": "CPPCA0031C7PF4616042A6411416A1DF481601QT", "device_name": "CBRN", "i2s_tag_config": "20150812-g:PRODUCTION", "deviceDspId": "82a8b26b2304c2e8", "aptinaId": "0000f320cc82320b803f1cd6d67ca069", "firmwareVersion": 151 }, "scan": { "timestamp": "2023-03-22 08:35:09", "t_cmos_before": 17.584445075219964, "t_chip_before": 23.62, "t_obj_before": 0.0, "t_cmos_after": 18.294067556060156, "t_chip_after": 23.87, "t_obj_after": 0.0, "sample": "AAAAAEPShV3QZQkQrEMaGczSBHmRAx9vvYWQf5CT3uxpVhfFfJVOz5jTJXqC0cvvR8z9ywCFQEOOoVecSgCpfmu2u9sprzJi9N3htTaj3d4fZncKBXdP1qaEpJsrpMgW1atr1cl49I5GqzNpe7TswE1mYpExw9S_La4DBPEclViGqG2tPRGBmmsSINmXIAQuqArHACwktkEwdtYU91tHXvFkSTaqOMK6r7oVR5sJJ0WnhwOauXg1ww0Si1GMmGrmA2gGp-nlC1FEb3Qlv2Fe7mA7H4OHpQU6Mam82o9ERRN5p24wdDD7jdDIRSdYTNWJv842wNjJ4ccQzts2H_exdWJueUUQ5uraXMhIbOnutdiu2KB8hwpjI1CRI8K4SfJnP9faFN7rDqHc_8oNP-cSZdPBT7BBRBkOWIArV8h0kyQJ3xSKp9ix1s8hbVOUzqCFBizipgKprpsfepL_XCZwxKv_89QegFDzaSQzMatCUtWytFXN906IqLj2Lh89omtYzasvxYlSA2FC2pdp3NnoPnVk6Tbofg4qouWDxEUlZcsd_fbUgd3zQZeveToG4MiNcmAIUIM8xpnJXeAIDiHkEliSpSrzLIsjeSCZT_S_8YIsUDYDSHDd6D7UZ-0cMNOqNkWu-nLN8Gn5D7CVRmZkZLlfqASh57O2fw77u_Xm1-xmRFOeyxLxqo-IeVy70Dl5csbv5hBCTMn7uWwTywJzZ9vCUg_cjTqPMwO0wKcJCkeVja2H0SvrP-v0AJDpp7XOxGmGmWyzmfwR95N9gHscLKV2_atWGii7XIws2kO7gfgi0QmiFnMLxupvJztHdqeGCKSDkJ_2OEQ9nb2xdCT9M12swTLyRs2ebiExE3Wg6p7_xXDUbgRcA_nAOeHo_cByeJrRWZGP-LoBPCD47Qc8KnwSWhjmMRUxx-TEzjm_oUyNIT1ADtc_olU0Uo82bX39rOMN0-ASMGnzRtvEBl2tampw7GLqHxL4GPsoHncaSdjU258VapGM_NnbrsHtP4QmS-VkrAFcCG1aBM7wg6IjoGiDIW1ltpAq9dHRLs4qHI_1LeplRhVcK1HWG3d_QJfUM0YVQTJXAOA7SCD4KMrTO_eNDRf5Y5KnYmjsHM944QrGHnkB0-RbqwY_L641a7-wjowsGUTYDOneVDUw08_Tsbee_cdUXY0ZKSZn7Ojn916tFY1k2yE52dp-AvALI4ULazX-M8eSzcCyZhVjTV99meyK5TLTjmRrtIffu6A4mctLLnk-nQ0Qe8vka2E7uf7FAD_EfzxnjzMIViFsdMC3IzfTQPNRFF858dkt8pKftQF2I8aJVGiBiTWXqy9hMXdun9KvAI2uLB03qttJDezkuS0cNup35cdBke4hHrGyaeTmds4oVmVG2zxvHjRIsZ0NW7HuZYaBiGFxiyM4nDnJqmR5vImQczhFzcN9ACXbzqN-ry-KZ50WFfEMt_r-eO7YTlsP_WTI-wgITFuoNNpXkYkv0m_VcgnyfGMoj77Kd8ZCQgG0bgtkhom8B_7icnzbV38Dc6Pa1gcuLCRVqD18ZwAPw5_FI7OM4H_6-zmaidTlTYbQ4fJIPcfUQF2EyIbhrEdMenXlosoOOd6gktr9wytydkQ4pqIjpqehJMyD0J4QxM1l080X_Meco1TgRZkfW-EakTCFyceqSGjNM9DXxHZJ-rl8GJzFuPyt58TV70nh5wGSgtxY2IDu7nTkykGZullY9l6IcskvdubPRnB9GpNDfgINOP-lqZQFY_xD2jOHDB0SsGh_YigOT5vv3ARN7O7WE9w8H9WJdtt1_sMeoLVgyzQFSz7dfzlpHPN6BgKNX-4B2yUBYZvrtZ-1iBPtWeH5kSDH34RCuslaX6iySbMx5RHqOlNPlx2y1TgIQ-3YTP9R8Z49_kiVn1XTdpvvBgOv2CWBSwXoP4hQuYWjqckDrsPO7jmvFR6FGfzKKNCfCO3F4ak3pr3hAK4pJs2wHgVHJ-Pe5tXWw0jXPWXZIx4l_cZnCbXoGvgc8QBpt8oj40w3r8uqqKcHzz7qr1PE_rnKaNamErxdMWoHf_qqLbHjIX7bmEFiJQooqZVE5INYTskOKpb-ar2ttEIztEPA_ylOW9kcAnHJNPzGIj4yALjQvqTat20_LALeoh6QGQs1okngMuQS8Kefd6lffi79WUt-YJvHjXL1HneXQRZl2fMTWK3T8TyYyL8-K38jzTwfz2L7xq9wcbYa4HkRuPInsteP1mY3KabgQQRRfG6FD0Gs3Ao7y0HbLbKfjY9LU3ZuN4PA_uqZPmVDVURRzTIpagnHJJkDvZLY4IYwnL--H1qWHLrwWqOntTYPTQPPkStDVoL_qiqsfroZYDgTWr-X5ADKpDumCaYHtZFrilDjzb-6vcUI", "sample_dark": "AAAAAJ7cyEOGGXV0z3_HN9l7ab-WQSJX9W7SZZAE5TqrenhtdddRkUfmiJJY4umRnpBnbx6fXSkqrJZORByKq64yEaH14PfedIjKK3753vbTRkTG4edPgG7FP_wbLmEOrmwVqFfh0bQbs36L4c5p3M6BFJuO8kod8mYxzAQY_566vwUlq3EFw5zjwHGqmqBtsj7Qa8k-sJXJtEBZibHRzk5ahzP66kLmYwyBOIXuOWp9FDQSMvO29Bq6-G78wvqZD6sHCvIa_ENqN5-llJWGuC8SCfLDMOKVyjpSVFLHPp5N-VuG1xRjD3pc7bBXgXnbdIfelDnvcNgWWBc7jGj1fI2KiFzoZ3x9e-igG3KdZlZ7OzM8tpp9bDeGZD3Of2Fce3CO3LjqHq7TrLS4ci-cXMGJsQDFTOhdEexwpCLDPMbHIw6tZv2JQgtiY3IsplqsLFbbIp_jGjy2_b52_3jFq19CpZXx1oOIiSFnXGU0fkfI-86cK2Dbz5Qxpq-FElQj8EtvnV5vTofccnz4dLUl7DsB4581e_XNuWun0-FKaVKUVL6-YjJC2hVmLGAD98rS95S36D3MrlAfG6yii_ZUPkAREqKjxnqtWh8q7I77M94Jy078FU00LP-mitSAkrz4Ot0cfx_kdBRb1Y2MtZ6jCffNLFlvRofEo3xFgsdWQnaupUf8CIDjdPBhvd4t4lwBNDsOfGfMn2bKZeHYhc8XnN3BEx6Xkn4_o2VyWOXIiQK3azv82lbcJW817FSdpPnfqb8Tawfz9APU7d1Wyy7Vdf5QK0Xt30hLbhAnJGCtPaihBO4UNh_kjn9pomcP90fPKCRQuR1YxAxYfapyOMfMnkcqlFGkBCdawKwjfXj7YsaH3i0cVWrqeAE7UiS2dbVh4qnx1326sYPVcQ0Yz3xshvuw6SIll0J1Gv8k_z-p8ouUKdDdKSOsGq6AGSb_5Io4FacTH4TVPjbtt6Rd4_WQA8Pmrm-6RV-6R3ACVvQfgBQW9uZamdznugT3ktrOHnUyvM_VnlrzetNzRsORPGwVEVPUfgdCMS6Smk6DQj4331ftJM1UEq4oJlWvTRx3PQgLmuijOjAqRU97CazfUOFvphxycomiL6gh8BfpDjS2kikeafEfWwWteQmiRo4doGXtxacD9ntmOIUq3ifoTHvp-Krw2LXoPf4R7cB-5UvNxqrdO0UB7TouaZNur84pDFZ-R_C7Qniyr-XLoWParH2gCoMnDVJwIKPhGKybGqDTBDLzM38ml7KmPTIum7RTPHBJmDz0g-HuJdBIPLqHVZ5iP2YgGqGanXzKj3ULZhgEIAVcZE3PkhgHL7Anvu_7L69Yx4JNyqo583WROoPmmx-Ec9cystlqqkyQyk3t5IF2jIuYRuicdzLgYORRUQAdz3doUD633TdDFVg-fa3E7hVNBIu-Xzwo4kUikf8iyECVIyLWKXhdY-dZG0K6fIBryf89nOvArshO_wgSE73Gk9_QucHx6WuYHX4cokLCb0j50hc7w0wm4g9NVP_5CMLU0anClyDd9tptLwG9iaxi45vUDhxBe2FKLNUUmN_BuLIaH1HioR7SCQAmKXZlJTQos0JdL_8AckObf9ZY5FpbNf1x_BIMydlUhTK9MP7f66r7KSX3eOknVFp_GqM9nHMEtH1AFtpjxoE90-eClFmj2XT_Gb-D2MuL4CST0xVVaE-WPMGasXF7ue4JwSGFD3jRefbHU2zJxL3grUYr6pVaHELFAcsgcHv-VbHg2OmQiUTGgne71wWDI8DgjO-pWJTsQe6MixQp_FaKmgZj2DabZKtfcPVCFaGKFtc0GmwEZ3N8mY6tu-Uw6U8ge4XFIwiuAAS-uw7ib7rl-U3XKcJRXxjic7xtXJFXDFRyQdEv_v3gTWgfVmi89w80OvbMZ67J2oUZi-jfU7lbkxpg5ewwi9p_91u-GDTEMt9PwBHL6G6xH2MOq5gGO5hk2zPvXagD9r_GQ3Mf19I5IeRLwX9lHzl4OqEoiUwQcyaFYCAyuhqEyBTl3mDL77hssc9MmaSpbvsz8Nxcnoy9nBmxkJl3y0m1aY5KM42Fv44bWFyeDsC-8YfIvxWcuMvC94aqyibcXATGtg6tUgtezTz1HefmffL348PMjmpzGpWPdSFllOKNgWALV8lTWeMAJsW3ohNFzaEpTMdwYGKW1t_NhQFxdq-sLQ6F8ejoDTYvQ-I7PlRNxB2_fYZFdQqHItO-yERQeSvb9lbZRcw8cokBKKZIUlOqrzQtl9jFOtGTXH0W0EHEMtISX0CM_E59K5FbaLwDtJiC8ohMzvHL034ukUaTzaKHIA9zo6DXP5NZInDRHbbUys6DP2DHhY5qlyGI9mSAOLivee6wjc9PoXYfWeJO4XgO02znbH", "sample_gradient": "AAAAAOp1IDt2sDXjiCAmmKUIf8P-UY3KdjodURTElAeQz-_lyjx2QjNRB3wCMQTWamkIV-SLRWeG78QvxiRuSAH2Ayk8mvdE4081PSSL5D-IF2SGJQnJxO9ZTKFOFUbF_0-_txCL670XhA4nV2fCd9q91He_TWyPmSJgkAMfXsfV1qMa8JYVcdLorUfLMNuO-3Q-Odicl_p65vHLimE7RVdvpoM46o41KZOc_Qnrhrtph-R4bd0iw8ThPYb-cHMJNSIuUqbtxx3n509jGTJaBclTJsFSi_VDPQ700vbZRn2bEmHJUAr3O5KXD3BziwZQw844qlwRR6sixZVy3CSfZ1AcJz0nseZn4VfVQ_CwlJLA_9ly4z8YhOKeWcpdEuTeyut2lAlWba8HLnonzMvJmlEy_LAXnUDZp_pndJlhBh8GQXxbr4Z57idajD-n5PjCzxXFAhYVp2IFhw1Iv8tSpf8iPZ9Hu_u3udMke0lrxo6WojdoZjEI9tX8mYXvxclK7a52RgJ3L0HBBuPy10YWNdWsYgo3H2UYAJ_XrYZxO4VMRH7B3NS_zH_J8hS6w8mW_9hXpgNeTV7zdoHO819FTkkcXIthwwLfcomVatcGQcoBGuVx4FCb5hKa8tjj3f96aAjuvGJ8O-VAELU8RBq7AvSumzvr8qVGb1J5PQVeZRowx3sqwK4YWbqS1imDhfTwp2J-JQgfTrIreaQ234P_Uzeofk1mLB7i32ameWG0J6HsixanjwKjXpO7Deywp0aMUDqHYO-nULwo3Rjw_4W4m91H6Rit_H3-u3EGX89xHssqdwtjgDJCfH_zcxn5xnLM8WqBRzNOdCTXnoO5LwB4vBPItwfvqYkFlZYDLh3gDbRiXEIKYpD5GIO_cZd2lexL0WfeD_wcuq3LpuCTlIxeP_8xyBt_rfrSN4G3mD6YyZJr6asqStehI_3JapQkpy6BnyldUYs50CVcXGdv45Ere57EBTKIiE3ze4KkiN1eAG5wp2LTriQdLC9R4Ia0RIPqohAXF3XFyAXX77ZdaQSnhqcq8RvVv_aUfiBTYaEdVYnmW9MhOYTMeKDfC262oF5ImwZTr81OJPtQi6-JjpFBMXfoe8O0ojY1HZeS4fCj0ruAFkFp2SIr1sVFMIlrD5FPVM_5NLfBoX_0iwLVqspycyzFrcWZp6HZdxnIiqfDVYq0G3CLYhOpG8dqNJdQcdmPXZHMTh6Ba7f6oAPEXFFWxUVbaIaq_gudtjParQCE6q-Ucuwk-gWniicvR0p0H7I2WqgvabxVAG192aA7iaCcSddKOZiL8wPoB_ORHy5s-UG-1l-1GJdiEQemJ6ml0w24XWQIBbcAF9Mb1UgHe3yi9n2sixsyti0Wzu5wlQys-YEVjHMW4ggJ6d-55MG0ff05KYvgo0bGElASmu7krMa6fAOr8IPq9YTWZsfrkvrRGvxE2HuPbGetwTvvPPm0OZOY2P5gkoR2ozg2d161Zh-15OyxCvObcBfskH7a7t3b1OyqY8He8cH0qJhb3BVHsv3odVXKZzydItqX5NzEOZ7bsPJ_56CuRa1e3AW-98MoaIoaHO80OrbSg6COhdA0uBuYak1_5ekestdwpIsNWW2P451bcYSCUrw7HcXKMahPk99GRatZYGCp4LzEWrXSwVsRse4F9u_ezZKpLoxyjn_8mw4QL-75qPKdptemyfwUHrpdmnJjctW0SSzeyztucwTnkeKvcpLHblpXbC1-9eCPYDlXua-WmHkVX1XzL9B_sAqpr1TJJW6fUG1mqDNJ8PwCZWBQJRgx3ICds9itU1LPlkhhdbFwtPdhNkaeIB7CGij3MPTQdlle_eBWUC8c-R9cHDz5KceAe2mLsTAZqp6P1o-x9aX_ayzK" } }

kebasaa commented 1 year ago

@hbsagen What is your question or comment? You just provided a scan file...

hbsagen commented 1 year ago

@kebasaa I had no question really. But I am eagerly following this thread, as I want to extract data from the SCIO as well. I thought more data maybe could help somehow :)

kebasaa commented 1 year ago

@hbsagen I see. Thanks, but please start a new issue next time you want to contribute something unrelated to this current issue. Your data shows me something interesting at least: It seems like your SCiO is reporting a different number of bytes in sample_gradient. Mine has 1656 bytes, yours seems to have less. I would appreciate some help though, if you have any experience with this kind of reverse-engineering.

@earwickerh Still hoping that you could answer my questions above. 400 values in 1800 bytes somehow doesn't add up, so I'm wondering how that could work... I understood that you're using big-endian decoding. But the hex values you're quoting are nowhere to be found in the data itself, and the code you posted doesn't work. Can you please update it and add some explanations? Thanks