kebasaa / SCIO-read

Read data from the SCIO spectrometer
GNU General Public License v3.0
31 stars 4 forks source link

How connect scio scanner ? #1

Open onoff0 opened 5 years ago

onoff0 commented 5 years ago

I have a SCIO scnner without a developer license, I'm following your project I launched "python read_spectrum.py" getting errors in file1.txt! is there a guide on how to make SCIO work with your project? thank you!

kebasaa commented 5 years ago

Hi onoff0, great that you found my project. Essentially, I haven't written the python interface to read the data from the SCIO yet. I read some data "manually", which is in the examples folder, and "read_spectrum.py" will read those files and try to extract the spectrum from them. Unfortunately, the data is encoded (as you can see in the "readme" file), and I still haven't managed to decode it. I could use some help with that. Once I know how to decode that data (which is what I need help with), writing a bluetooth interface to make the SCIO work will be easy. I would really appreciate if you have ANY idea how to proceed

Now, in order to get read_spectrum.py to work with file1.txt, I'd need to know a few more details of what exactly your errors are. Also, try with file2.txt or something like that.

onoff0 commented 5 years ago

to have the clearest concept: the raw data where you took them (file1 | 2.txt)? not having the developer license do you have any other solution or idea to decode the data in the test files? Excuse me for my English, but I need these questions to better understand the work you are doing

Thanks !

Il Mar 9 Lug 2019, 08:35 kebasaa notifications@github.com ha scritto:

Hi onoff0, great that you found my project. Essentially, I haven't written the python interface to read the data from the SCIO yet. I read some data "manually", which is in the examples folder, and "read_spectrum.py" will read those files and try to extract the spectrum from them. Unfortunately, the data is encoded (as you can see in the "readme" file), and I still haven't managed to decode it. I could use some help with that. Once I know how to decode that data (which is what I need help with), writing a bluetooth interface to make the SCIO work will be easy. I would really appreciate if you have ANY idea how to proceed

Now, in order to get read_spectrum.py to work with file1.txt, I'd need to know a few more details of what exactly your errors are. Also, try with file2.txt or something like that.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kebasaa/SCIO-read/issues/1?email_source=notifications&email_token=AFOLXD6QBFNUIP4UDYLP6VDP6QWUJA5CNFSM4H63PAIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZPH7MQ#issuecomment-509509554, or mute the thread https://github.com/notifications/unsubscribe-auth/AFOLXD34EH7T5MCGFNUFUSTP6QWUJANCNFSM4H63PAIA .

kebasaa commented 5 years ago

Hi,

No problem, I am happy to explain.

I don't have a developer license. Therefore, I used bluetooth to collect the encoded raw data. Now I need to decode it. For this, I need to understand what the structure of the data is.

If you want to collect your own raw data from the SCIO, here is a step by step instruction:

  1. On Linux, install gatttool and hcitool. I'm using Ubuntu, to install:
    sudo apt-get install bluez
  1. Turn on your SCIO with a long press on the button

  2. Run hcitool to find out what your SCIO's MAC address is. It will have a name like SCiOmyScio or whatever you named it:

    sudo hcitool lescan
  1. Run gatttool with your SCIO's MAC address to collect your own data. This will store it in "file1.txt". Replace xx:xx:xx:xx:xx:xx with the MAC address you found in step 3. During the scan, the SCIO indicator light will be yellow.
    sudo gatttool -i hci0 -b xx:xx:xx:xx:xx:xx --char-write-req -a 0x0029 -n 01ba020000 --listen > file1.txt
  1. Stop saving data to your file with Ctrl+C after the indicator light of the SCIO goes back to blue.

  2. In a text editor, edit your file1.txt: Remove the first line saying "Characteristic value was written successfully" and in the beginning of each line remove "Notification handle = 0x0025 value: ". Then save the file

kebasaa commented 5 years ago

The main problem now is to take this series of hex values, and to understand what they mean, i.e. how to extract the spectrum from them. This is what I need help with. It turns out that the first hex value of each line identifies the line, i.e. 01 is the first line, etc. Values go from 01 to 5f and repeat 3 times, although for the 3rd scan, they go from 01 to 58 only

onoff0 commented 5 years ago

ok now everything is clearer! but can these hexadecimal values ​​be decoded with any developer license? or is the license unique only on the device that requires the license? in both cases how do you plan to proceed? is there an algorithm that you have identified to work the raw data? I believe that their servers through the univocal developer license (user that requires it and the SCIO hardware) send a decryption key to the user, but it is only my thought, it could be checked on the same material with two different SCIOs if the raw data is the same, doing some diffing! What do you think to do ? A greeting!

Il Mar 9 Lug 2019, 14:24 kebasaa notifications@github.com ha scritto:

The main problem now is to take this series of hex values, and to understand what they mean, i.e. how to extract the spectrum from them. This is what I need help with.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kebasaa/SCIO-read/issues/1?email_source=notifications&email_token=AFOLXD3KKTZJVEEE6J5L5CLP6R7QJA5CNFSM4H63PAIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZQDEOY#issuecomment-509620795, or mute the thread https://github.com/notifications/unsubscribe-auth/AFOLXD3PQP375BDTVTSOJ5TP6R7QJANCNFSM4H63PAIA .

kebasaa commented 5 years ago

Hi

Well, the developer license basically allows the android app to export a CSV or JSON file, which the server decoded from the raw hexadecimal values. The problem is that we don't know how the server decodes it, and so I thought that if I have a copy of raw hexadecimal data and of a CSV, I could try to figure it out.

What I know from people online is that the CSV or JSON that is exported supposedly contains 400 values of reflectance for each band from 700nm to 1100nm. After pre-processing the hexadecimal values, for each scans, there are 3 readings: Two times 1800 hexadecimal values, and one time 1656 hex values.

I do not think that the hexadecimal data is encrypted because the microprocessor inside the SCIO is quite weak. But the data is encoded. So the server does not send a decryption key, it only does some mathematics with the 3 readings of each scan. I think the second or third reading is related to calibration.

Do you have any idea how to proceed?

onoff0 commented 5 years ago

the only idea that comes to mind is the same! ie remedy a decoded file to see which mathematical operators use the SCIO server! Otherwise knowing in principle the values ​​that it returns use a software that forces the data to obtain similar and desired values! like John the ripper (which was used for other purposes but the meaning is that) if I can do it tonight or tomorrow I want to try to see if I find something could be useful! What do you think?

Il Mer 10 Lug 2019, 13:17 kebasaa notifications@github.com ha scritto:

Hi

Well, the developer license basically allows the android app to export a CSV or JSON file, which the server decoded from the raw hexadecimal values. The problem is that we don't know how the server decodes it, and so I thought that if I have a copy of raw hexadecimal data and of a CSV, I could try to figure it out.

What I know from people online is that the CSV or JSON that is exported supposedly contains 400 values of reflectance for each band from 700nm to 1100nm. After pre-processing the hexadecimal values, for each scans, there are 3 readings: Two times 1800 hexadecimal values, and one time 1656 hex values.

I do not think that the hexadecimal data is encrypted because the microprocessor inside the SCIO is quite weak. But the data is encoded. So the server does not send a decryption key, it only does some mathematics with the 3 readings of each scan. I think the second or third reading is related to calibration.

Do you have any idea how to proceed?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kebasaa/SCIO-read/issues/1?email_source=notifications&email_token=AFOLXD2QHFSKK4SYZFRJNEDP6XAOJA5CNFSM4H63PAIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZTEQNQ#issuecomment-510019638, or mute the thread https://github.com/notifications/unsubscribe-auth/AFOLXD7AJ4GU4SANW5DOACDP6XAOJANCNFSM4H63PAIA .

franklin02 commented 5 years ago

Please provide your email address. I can send you a spectral data (CSV file).

kebasaa commented 5 years ago

Please provide your email address. I can send you a spectral data (CSV file).

@franklin02 I sent you an email. Will put your files in the example-data folder as soon as I have them. Thanks a lot!!!

kebasaa commented 5 years ago

@franklin02 Thanks a lot for the file, I added it to the example-data folder. Unfortunately, this is not what the SCIO with a developer license creates (that is supposed to include the spectrum, raw irradiance of the sensor and calibration data). Rather, the file you sent is a collection of scans of different materials, containing only the reflectance spectra. I have no idea how useful it will be....

kebasaa commented 5 years ago

@onoff0 I updated the script to properly read the scio raw data. The decoding is still a bit unclear though, but I now know that 5 hex values are one 1nm band. The conversion to a float is still a bit unclear...

onoff0 commented 5 years ago

great ... can I know how did you identify that 1nm band corresponds to 5 hex value? what tools did you use? or what kind of algorithm did you use? other thing, I could have your private email thanks !!!

Il Lun 15 Lug 2019, 22:32 kebasaa notifications@github.com ha scritto:

@onoff0 https://github.com/onoff0 I updated the script to properly read the scio raw data. The decoding is still a bit unclear though, but I now know that 5 hex values are one 1nm band. The conversion to a float is still a bit unclear...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kebasaa/SCIO-read/issues/1?email_source=notifications&email_token=AFOLXD4STD434NZWSTIIBQTP7TNGTA5CNFSM4H63PAIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ64H5Y#issuecomment-511558647, or mute the thread https://github.com/notifications/unsubscribe-auth/AFOLXD4YIOEEI6LF2UAYSGTP7TNGTANCNFSM4H63PAIA .

kebasaa commented 5 years ago

It's an educated guess: The data comes in 3 parts. Each part has a header that contains some sort of identifier. After that, there are 1800, 1800 and 1656 hex values left. Knowing that we have 1nm bands, and 331 of them (counting from 740 to 1070nm). So I tried to divide 1800 by 4, 5 or 6 and never had success. Then, I divided 1656 by 331 and I got 5, with 1 hex value left (I assume it's a divider or something). Now, it makes sense that all the 3 parts contain the same data structure. So I did 1800 - 1656 and I got 144, which is a very nice round number indicating that it's a header or something. This means that either the data of the 2 first parts (with 1800 hex values) are either preceded or followed by a block of 144 values. I don't know which one yet.

kebasaa commented 5 years ago

@onoff0 Did you receive my e-mail address?

onoff0 commented 5 years ago

yes I wrote you ... thanks!

Il Mar 16 Lug 2019, 17:19 kebasaa notifications@github.com ha scritto:

@onoff0 https://github.com/onoff0 Did you receive my e-mail address?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kebasaa/SCIO-read/issues/1?email_source=notifications&email_token=AFOLXD23ZTVQNFEKLWM66TDP7XRJXA5CNFSM4H63PAIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2BGO4Y#issuecomment-511862643, or mute the thread https://github.com/notifications/unsubscribe-auth/AFOLXD4XKZECIVPNQA77X43P7XRJXANCNFSM4H63PAIA .

JanBessai commented 5 years ago

If you are (like me) not a fan of Bluetooth you can do pretty much the same via USB: When I connect the scio to my linux computer via USB I get

$ dmesg
[197481.871683] usb 2-1: new full-speed USB device number 18 using xhci_hcd
[197482.003419] usb 2-1: New USB device found, idVendor=0451, idProduct=16aa, bcdDevice= 0.09
[197482.003425] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[197482.003428] usb 2-1: Product: CP SCIO USB CDC
[197482.003431] usb 2-1: Manufacturer: Texas Instruments
[197482.003433] usb 2-1: SerialNumber: xxxxxxxxxxxxxxx
[197482.006466] cdc_acm 2-1:1.0: ttyACM0: USB ACM device

I can then do:

cat /dev/ttyACM0 | hexdump -C

in one console and

echo -n -e "\x01\xba\x02\x00\x00" > /dev/ttyACM0

in the other to obtain a message in the format described above. This is a bit more lightweight since it does not require extra tools.

JanBessai commented 5 years ago

Another hint: Blackfin processors normally use fixedpoint arithmetic.

It is a wild guess but I'd expect samples from the device to be 16 bit and normalized to [-1; 1).

I've also written a small program to try decoding floats with 4, 8, and 16 byte, skipping 0-800 bits from the beginning and using big and little endian. My results so far were values which had to much variance (skipping from e-100 to e100 in adjacent samples) or tons of outliers (NaN, -Infinity, Infinity). I think this makes float samples less probable.

kebasaa commented 5 years ago

If you are (like me) not a fan of Bluetooth you can do pretty much the same via USB: When I connect the scio to my linux computer via USB I get

$ dmesg
[197481.871683] usb 2-1: new full-speed USB device number 18 using xhci_hcd
[197482.003419] usb 2-1: New USB device found, idVendor=0451, idProduct=16aa, bcdDevice= 0.09
[197482.003425] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[197482.003428] usb 2-1: Product: CP SCIO USB CDC
[197482.003431] usb 2-1: Manufacturer: Texas Instruments
[197482.003433] usb 2-1: SerialNumber: xxxxxxxxxxxxxxx
[197482.006466] cdc_acm 2-1:1.0: ttyACM0: USB ACM device

I can then do:

cat /dev/ttyACM0 | hexdump -C

in one console and

echo -n -e "\x01\xba\x02\x00\x00" > /dev/ttyACM0

in the other to obtain a message in the format described above. This is a bit more lightweight since it does not require extra tools.

I like this very much. Beautiful. I've been working on getting bluetooth measurements to work, with a lot of difficulty, so the USB method is really helpful. Also, if we manage to decode the data, it will be useful to install on a robotic arm, for example, and not be worried about recharging all the time

kebasaa commented 5 years ago

Another hint: Blackfin processors normally use fixedpoint arithmetic.

It is a wild guess but I'd expect samples from the device to be 16 bit and normalized to [-1; 1).

I've also written a small program to try decoding floats with 4, 8, and 16 byte, skipping 0-800 bits from the beginning and using big and little endian. My results so far were values which had to much variance (skipping from e-100 to e100 in adjacent samples) or tons of outliers (NaN, -Infinity, Infinity). I think this makes float samples less probable.

This is interesting. Well you know, it may simply be that the measurements to vary a lot, and Scio (the company) deals with it by doing 2 things: (1) each measurement is a mean of 2 measurements, and (2) the measurements are always divided by the calibration. This means that if you divide it after your decoding, the spectrum may be right in the end. Please try it and let me know, I would be very interested to hear if it works.

JanBessai commented 5 years ago

Please try it and let me know, I would be very interested to hear if it works.

It won't be that simple to implement: for my previous attempt I wrote a quick and dirty Haskell program to brute force through all decoding parameter combinations and find "best" options by looking at average differences of subsequent samples or minimizing outliers in the first package. This method does not really scale for multiple readings, because I have no idea how many bytes of header information to skip in the third package. Additionally, even if the divided numbers were to cancel out producing numbers in a reasonable range, I'm not sure if that would mean anything. Also, the numbers I got were so far out there (in the range of +-1e100), that rounding errors would introduce too much noise for any reasonable measurement.

kebasaa commented 5 years ago

Did you look at the readme in my project ? I documented all the headers. If it's unclear I can try to explain it more. Let me k ow

JanBessai commented 5 years ago

I was referring to the header/footer you described in

This means that either the data of the 2 first parts (with 1800 hex values) are either preceded or followed by a block of 144 values. I don't know which one yet.

which we have not decoded yet. It means that one has to guess an offset after which the real samples start or a position when they stop. This offset can be assumed to be consistent across the first two packages, but there is no information about it in the third package (yet).

kebasaa commented 5 years ago

I see. It means that for the first 2 parts, there is a header of 144 bytes, while for the third part there is no header. I have no idea why. Knowing that the Scio flashes its light 2x, I assume that it measures 2x and makes an average. Then, that means that the last (3rd) set of values is the calibration (5x331 = 1656). This might be the easiest to decode, because we can guess that there is no header in it.

kebasaa commented 5 years ago

@JanBessai Can you try to decode it using 5-byte or 10-byte chunks?

JanBessai commented 5 years ago

Can you elaborate on what you mean by chunks? 5 or 10 byte floats don't exist and the blackfin fixed-point types also use 2, 4, or 8 bytes.

kebasaa commented 5 years ago

I know that they don't exist, and that the Blackfin doesn't use them. However, we know the following, and then I made some guesses:

Therefore, I guess that there is some custom conversion to a decimal value using 5 or 10 bytes for each value

kebasaa commented 5 years ago

@JanBessai It appears that the values are indeed 40bit (5-byte) integers that get divided by some large value (currently, I think it's 1000000000). That then results in a float.

kebasaa commented 4 years ago

@JanBessai None of the values I decoded with a 40-bit reading make sense. Basically, the readings divided by the calibration should always be between 0-1 (1 being 100% reflectance, and 0 none). By testing numerous decodings, I tried to find something that would yield such a vector. Unfortunately, I have not been successful yet.

earwickerh commented 3 years ago

Kudos on the effort everyone! Thanks for starting this project @kebasaa. This may be (is porbably) a stupid question but: wouldn't it be helpful to do a scio scan of a sample for which we already have real NIR spectral reading data (within the same range)?

StevenLColeman42 commented 3 years ago

On Wed, Dec 30, 2020, 5:09 PM earwickerh notifications@github.com wrote:

Kudos on the effort everyone! Thanks for starting this project @kebasaa https://github.com/kebasaa. This may be (is porbably) a stupid question but: wouldn't it be helpful to do a scio scan of a sample for which we already have real NIR spectral reading data (within the same range)?

It could be helpful as a baseline in order to investigate the measurement variance across devices.

To my knowledge the reason for the calibration process is to adjust for temperature variances between reads in different environments. So to be useful I would think one would need a very accurate temperature measurement in order to be useful to others. I'm not sure what other factors one would need to control for as well.

fyi - its been quiet here. I'm not sure what other people are doing with this project. My SCiO sat on a shelf for many years and when I finally needed it the device would no longer calibrate. Nothing changed except the software installed on my new phone was giving an error. The people at consumer physics refused to even talk to me unless I shelled out another $900 to buy an up to date model. I found this project and was able to at least get data off my device, but still have no clue how to decode it yet. I intended to come back to this in another month or so, as I have been busy starting up a research 501(c)(3) where I may actually need this data. Thanks to the developers for reverse engineering software this device.

kebasaa commented 3 years ago

I'm really glad someone's still interested in this. I've tried a lot of ways to crack this device's data, without success, but I'd love to get it to work with anyone's help. And yes, SCIO (the company) is not helping, unfortunately.

StevenLColeman42 commented 3 years ago

On Fri, Jan 8, 2021, 6:33 AM kebasaa notifications@github.com wrote:

I'm really glad someone's still interested in this. I've tried a lot of ways to crack this device's data, without success, but I'd love to get it to work with anyone's help.

I am a retired computer/data scientist with a little RE background thrown in. The answers you are looking for are likely found in the Android apps. While I have not looked very deeply it should be possible to figure out the proper word and index sizes needed for basic compatibility/interoperability. The DMCA has a provision for interoperability so this should be fair game just to make my device work again.

The pain here is that I do not have a working system as the new android app refuses to even calibrate my older device. I'm thinking that they may have changed the sensor size in the newer devices, and the latest software is simply assuming the wrong number of words to read out of the device. I think I saw somewhere in your analysis that there was some ambiguity in the number of bytes, and that might just be my issue.

I might be able to take a look at what the dalvik code is doing with those bytes before sending the data buffer over bt, so we can probably work out the proper word size and number of elements. By compairing that total size to the size my device actually reads out over the usb, that would at least tell me if I'm wrong in my assumption

I'm really busy this weekend but I am very curious. If you have specific questions about the App let me know and I'll see what I might be able to learn from it. If you know anyone with an older copy of the Android App please let me know, because there is a possibility that it may bring my device back to life and then I could answer even more questions. I still have some older androids that should have the old app on it, but I don't think they will even boot any more.

And yes, SCIO (the company) is not helping, unfortunately.

Consumer Physics would stand a lot more chance of being successful if they would just embrace the work people might do to make their device even more useful. Makers should not be treated as the enemy. Unfortunately their current financial model is instead to charge you $4k a year just so you can read the data off of your own device. Scio-read would do that for free, thus they will never help you do that.

You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/kebasaa/SCIO-read/issues/1#issuecomment-756708651, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7QNBM5UKTQXHBMOSDEGUDSY3USJANCNFSM4H63PAIA .

JanBessai commented 3 years ago

@StevenLColeman42 Reverse engineering the right commands to send to the device from the App might be possible, but note that the app just sends the data that has been collected to some Webservice and doesn't actually decode it. At least that is what I assume from looking at the API you get from them for writing your own code.

kebasaa commented 3 years ago

@JanBessai You're absolutely right, the Android app doesn't decode anything. The webservice does. It sends the data to a webservice that sends back the decoded data, which interestingly is saved in a log file on the phone. Meaning that I've managed to get some readings of raw and corresponding decoded data. I'll upload it ASAP

kebasaa commented 3 years ago

@StevenLColeman42 Totally with you there. I haven't used the device in ~6 months, so no clue if they broke the new android apps. However, older versions can be found on some APK backup sites. As I mentioned in my previous answer, I've just uploaded logs of the app that contain both raw and corresponding decoded data

Check out https://apkpure.com/scio-pocket-molecular-sensor/com.consumerphysics.consumer/versions for older versions of the app

nachtschatt3n commented 3 years ago

Well if it helps I could write a man in the middle proxy to get the requests and answers I also have the old iPhone app that still works but no dev account :(

kebasaa commented 3 years ago

@nachtschatt3n Thanks for the suggestion. I think all the necessary information is available in the log files (check the folder "01_rawdata/log_files"), with both raw data, encoded and resulting spectrum. But I'd appreciate it if you could help crack it.

celso-vitor commented 2 years ago

Hello, I've been following the discussions on the topic related to Scio, I'm very interested in seeing the features that it can add to my work, today I'm just working on creating models and developing apps, but if you're still interested in trying something, I'm open to contribute.

haakonstorm commented 2 years ago

Consumer Physics appear to me to be the absolute worst kind of company you can possibly imagine. Their crowdfunding was a complete cash grab, theft, whatever you want to call it. I have a SCIO lying around here if anyone want it I'll ship for the price of shipping and a beer.

hbsagen commented 2 years ago

I agree.

I would use it more if this, or another, open source project could provide me with the complete spectrums.

kebasaa commented 2 years ago

So an older version of the SCiO app (I think 1.2 or so) actually saves log files that contain both the raw data (encoded, sent to consumer physics server) and the spectra (decoded, back from the server). I managed at some point to find an APK online and after installing that to get these log files. It's a very inconvenient method, of course

kebasaa commented 2 years ago

The issue with this open source project is that it seems like the raw data is random, which points at encryption. And I don't have the skills to extract the encryption keys from the hardware...

junziii commented 2 years ago

@ StevenLColeman42 I do have two devices of scion sensor. One 1.0 and one is 1.2 hardware version. I am sorry to say that I am pretty sure, that the bandits from ConsumerPhysics pulled a switch to get first the old devices out of workcycle... After an App (Scio iOS App) update, my old scion was sent to "try to calibrate" loop for all time. Some Month after that, they tried in addition to that to take down my second device. It was in the same problem loop, till I get angry to them, now and without changing anything in soft or hardware my newer one is back functional again. Its a big scam all in all...

@kebasaa First thanks for your Projekt! I am one of the thousand customers, not happy with what they told and what they sold after that... So I would really like to see that project going to success for all the technical ambitionists... I have limited skills in reading out data from some chips, I have an Buspirate board and I would like to support your project in this process.

Does anybody tips or a good clue, on which chip the "so I got the problem" where the info to the encryption is placed or some marking on it so I can try to read out the related data?

StevenLColeman42 commented 2 years ago

@junziii

Its been a while since I even looked into SCIO because I have been too busy doing research for my nonprofit (hdri.org). My original plan was to use this device to gather the spectrographic data from the cuticle of a filarial parasite which is the culprit in a virtually invisible disease in humans that has no current way to diagnose it, nor any cure for the disease. If I had that spectrographic data then my thought was to develop a near-IR camera that could visualize this parasite just below the skin in much the same way that some near-IR security scanners can visualize the vein patterns in your hand to grant entry. It's a long-shot. So, I do still have an interest in this SCIO-read project but without actual samples of the parasite to scan the IR camera project had to be moved onto my back burner. I'm not even sure if the SCiO can give me the data I would need, but it would likely still be worth a try. Being able to see this parasite in action would be a real game-changer as it would finally enable actual research in the disease pathology, and for clinical diagnosis. There has never even been a long term study of this disease in humans because they can't even tell who has it. I do, and their complete apathy towards this disease (Dirofilariasis) is completely unconscionable and somebody need to fix this. They treat the people with Dirofilariasis just like they did with Lyme disease back in the 60-70's. Totally ignored, so I'm trying my best to break this ridiculous do-nothing cycle. You can read about this problem on my website if you want to know more.

At the physics lab I retired from (jhuapl.edu) we had software that was specialized for reading binary data into definable records and then visualizing that data, but that data environment was classified and I have been unable to get them to let me borrow that software system to dissect the SCiO data records. I was actually on the development team for a while, but since I retired I no longer have access to it. After receiving this email today it triggered me to look for an old commercial software package that would allow one to read in binary data and define fields to parse that data, but I can not for the life of me even remember the name of that software package. I didn't buy a copy back then because I am on a fixed income due to retiring and I was not convinced at the time reading the SCiO was possible or that I would still be around to make use of it.

But today after reading this email I started searching and stumbled across the Python /struct/ package that might enable the same kind of ad-hoc field parsing, definition, and display that I was originally looking for. The problem is giving those binary bytes actual meaning because without that you don't really know if your fields are defined correctly or not, and then knowing how to interpret that data for actual evaluation of the scanned materials. There is no SCiO documentation for this data, but based on what the device is meant to do we might still be able to reverse engineer that meaning from the data itself. I used to play with reverse engineering of software (IDA Pro) for security research so this is at least of interest to me and within my skill set. Time is the one issue for me.

After being screwed by Consumer Physics I do have the personal incentive to resurrect this device from their apparent /planned obsolescence/ regardless if it will ever wind up being useful to me for my own research. It particularly pisses me off that by my putting money into the kickstarter I financially helped them build that first device, and they later thanked me by making my unit into a paperweight. If I pick this project back up again, I just hope the battery is still working. I'll have to check on that.

On 8/27/22 05:05, junziii wrote:

@ StevenLColeman42 I do have two devices of scion sensor. One 1.0 and one is 1.2 hardware version.

This could be useful. If you scanned the same surface (e.g. calibration surface) with both devices then the placement of the version number should be apparent. Part of the problem is identifying the header information in the data before the repeating elements, e.g. records begin. Repeating the reading of the exact same substances (e.g. table salt) where we can compare those numbers to the known spectrographic data should identify where the records begin and give a clue to how to actually interpret those numbers. Does anybody tips or a good clue, on which chip the "so I got the problem" where the info to the encryption is placed or some marking on it so I can try to read out the related data?

Step one, I think if we can read out the version number in the data header that might help us to identify other changes they made in each device version. I think (maybe I'm wrong) that some later versions than mine might even have more sample data values and perhaps even different scaling factors.

The raw numbers being read off the chip likely needs to be scaled to represent an actual reflectance value while each element position in that array would stand for a different frequency that was tested. Different versions will likely have different number of records (upgraded chip sets) and the version number would allow you to know which frequency bands go with which offset.

By comparing the values to known spectrographic catalogs the frequencies vs array element numbers should be able to be worked out fairly easily based on which values spike for those known substances. From there the scaling values of the numbers could be worked out based on the calibration values from the same device. Due to temperature changes in the device the programmatic scaling will likely need to change which is probably the reason for such an often calibration requirement. Its sensitive, so just holding the device in your hand for a period may change the numbers thus the correctional values that need to be applied. The temperature is likely a field in the header of the data, so repeated scanning of the same calibration material while holding the device firmly in your hand might allow you to see which field holds that temperature value.  If it is, the field value should go up as the device is in use, but the data values will also go up because of the heat dependent noise level will rise as well. There may also be some kind of clock value. The numbers that don't change will likely all be fields in the header other than the temperature and clock field if there is one.  To parse this all out we need lots of data under controlled conditions. We need a catalog of spectrographic data for atomic elements and other molecules that are commonly available items.

This kind of data analysis and reverse engineering is quite doable. I guess I need to go see if my SCiO device battery is even still alive if anyone else out there is still game for this. If so, I need to take a closer look at SCIO-read source code again to refresh my memory. My plate is full with my own research but if others are willing to send me scans I might be able to find some time to look at this again.

8*}

kebasaa commented 2 years ago

@StevenLColeman42 @junziii Thanks for your interest, offer to help and great project ideas. I haven't been inactive myself, just unsuccessful...

First off, let me go through a few things:

Temperature, and even version information and device serial number are all available by sending specific commands to the Scio through USB. These fields are not in the scan data, so when the scio performs a scan, it sends multiple commands and reads the responses.

I have made a script that uses the Python struct package to attempt to convert the bytes to some sort of reasonable data structure, like int, float or even strings, using all kinds of stuctures and headers. No success (but maybe I'm doing something wrong...)

Then, I scanned the same material multiple times. As @StevenLColeman42 said, we should see similar spikes in the bytes. However, the readings are no correlated to each other in any way, which implies encryption.

Now I'm not an encryption expert in any way. I assume that a company like Consumer Physics would have done something fairly simple, like using the serial number to encrypt/decrypt the data. Alternatively, they might have hard-coded the key in the device firmware.

Now there are 2 possible approaches: (1) Try to decrypt the data in all kinds of ways using the serial number as a key. I don't know how to do that. (2) create a firmware dump of the device. This is what @junziii is generously offering, but note that this is destructive. The Scio is glued and can't easily be reassembled... I suggest to do this after trying to decrypt using the serial number.

The chip layout and disassembly instructions are available from Sparkfun: https://learn.sparkfun.com/tutorials/scio-pocket-molecular-scanner-teardown-/all The encryption chip is most likely the Texas Instruments the CC2540F256, or the blackfin processor

@junziii I suggest that you search for older versions of the app apk online, maybe you'll get more lucky with using the device

StevenLColeman42 commented 2 years ago

On 8/28/22 01:24, kebasaa wrote:

Now there are 2 possible approaches: (1) Try to decrypt the data in all kinds of ways using the serial number as a key. I don't know how to do that. (2) create a firmware dump of the device. This is what @junziii https://github.com/junziii is generously offering, but note that this is destructive. The Scio is glued and can't easily be reassembled... I suggest to do this after trying to decrypt using the serial number.

I do have an old IDA Pro (V 7.1, and prior back to v5.5) license and there appears to be a couple blackfin CPU plugins that might work to disassemble the firmware for interoperability purposes. If we had the firmware image this combination might allow us to determine the exact binary packing format of the data that is sent to either the bluetooth-smartphone or USB interfaces.

Note, I have never compiled and added a CPU definition into IDA Pro before but if the plugin works with my V7.1 this might be the best path forward. I would not want someone to destroy their device just to get the JTAG data from it but that may be the only way. Looking at Ebay there are no cheap SCiO's out there to dissect for this. I've never actually used JTAG so I don't know how much deconstruction of the device would be required to attach to it, or whether it could be reassembled afterwards. This is a very compact device apparently.

Given the age of the plugin development projects they will probably work in my ida 7.1, and I have several older versions of IDA if not. I'll need to break out the SDK and compile the below plugins to see if I can get it to work. They appear to be the same basic code only #2 was updated more recently.

IDA Pro info:

https://hex-rays.com/IDA-pro/

blackfin Plugin 1: last updated 11 years ago

http://codenaschen.de/tichyblog/index.php?action=blog&entry=1_Blackfin+Disassembler+Processor+IDA+Pro+Plugin

https://github.com/*krater*/Blackfin-IDA-Pro-Plugin

I'll have to look to see if the blackfin DSP specific instructions are included in this plugin.  There must be a reason for choosing a cpu with a builtin DSP.

blackfin Plugin 2: last updated 6 years ago

https://gitlab.com/*mc*/Blackfin-IDA-Pro-Plugin

Someone actually used the plugin here for another blackfin RE project:

https://www.eevblog.com/forum/testgear/sniffing-the-rigol_s-internal-i2c-bus/

The chip layout and disassembly instructions are available from Sparkfun: https://learn.sparkfun.com/tutorials/scio-pocket-molecular-scanner-teardown-/all The encryption chip is most likely the Texas Instruments the CC2540F256, or the blackfin processor

@junziii https://github.com/junziii I suggest that you search for older versions of the app apk online, maybe you'll get more lucky with using the device

A few thoughts:

It would be a good idea to make an archive of these apk's somewhere. (I have SCIO_1.2.1.441 and The_Lab1.3.1.74.)  IDA can disassemble Apk's as well. It might be that the original version was less complicated or might have been without any advanced obfuscation techniques and thus give some clue of the native format prior to being scrambled. It might therefor be instructive to see the evolution of their smartphone app. This is where the version number message type would likely be necessary for them to change the algorithms on the server side, and how they would determine when to reject your device and make it into an expensive paperweight.  What would happen if the returned version number message were falsified going back to the server? Would the calibration sequence then succeed?

As far as determining the format of the output data, according to the sparkfun teardown there are 12 tiny filter holes covering the sensor, and by covering up one hole at a time with opaque electrical tape and looking at what changes in the scanned data with/without a hole covered it might be possible to glean some further understanding of the binary output. The hard part is figuring out exactly where each hole is under that protective glass cover, but even covering the entire sensor might at least help distinguish the header from the data by negating the illuminator entirely and giving the lowest level possible across the entire spectrum. This would give the baseline noise level of the sensor at that given temperature. The diameter of each hole was apparently customized to compensate for the different color filter lenses or for the sensor/illuminator spectrum sensitivity.

The native CPU bytesize , endian notation, and possible floating point notation might give clues to why the data seems to be somewhat random in the output. It is possible/likely they just use scalar values for the 12 level values where the randomness would make little sense without at least some level of obfuscation.

A timestamp might also be used as a nonce in any encryption stream such that no two scans would ever be the same but the server might still decode it knowing when it was scanned. Since a connection to their server was assumed the server might be able to guess this timestamp, perhaps by masking the lower bits or by trying several recent timestamps just to cover any potential propagation delays. There would be no need for them to include the timestamp in the message itself as long as the clocks are synchronized. Was there by any chance a separate message type for synchronizing the SCiO clock from the server?

kebasaa commented 1 year ago

@StevenLColeman42 I have dramatically re-worked the code base, to make it easy to scan and collect data. But I'm still stuck with decrypting the data. Whether it is encrypted or not is unclear to me. All the data so far seems to be little-endian (e.g. temperature), except that one user in another issue suggest big-endian for the scan data.

Along with the scan data, the SCiO app sends the following information to the server: {'device_id': '8032AB45611198F1', 'sampled_at': '2020-06-04T14:37:21.253+03:00', 'sampled_white_at': '2020-06-04T14:32:46.187+03:00', 'scio_edition': 'scio_edition', 'mobile_mac_address': '38:78:62:02:7B:33',

So there is a timestamp, but also a device ID. That could be some sort of decryption key, but I'm not sure.

At this point, I'm failing at the following: sample has 1800 bytes, sample_dark has 1800, sample_gradient has 1656 bytes length. We know that 331 floats are return by the server. So the question is how that could add up. 331*4 (for integers) is 1324 bytes, so too short. Any ideas?

celso-vitor commented 1 year ago

It's difficult to say for sure without more information about the data format and encryption used, but here are some possibilities to consider:

1-Data compression: The data may be compressed before being sent to the server, which could explain why the number of bytes returned by the server is less than expected. In this case, you would need to first decompress the data before attempting to decrypt it.

2-Variable-length data: The data returned by the server may not be a fixed-length array of floats. It's possible that there are additional bytes in the data that encode variable-length metadata or other information. In this case, you would need to carefully analyze the data format to determine how to properly parse and decrypt it.

3-Different data types: It's possible that not all of the data is stored as floats, which would affect the total number of bytes needed to represent the data. For example, there could be integers or other data types mixed in with the floats. You may need to examine the data format in more detail to determine what types of data are present.

4-Regarding the encryption of the data, the device ID may indeed be a decryption key, or it could be used in combination with other factors (such as the timestamp) to generate a decryption key. Without more information about the encryption scheme used, it's difficult to say for sure.

I would recommend trying to gather more information about the data format and encryption scheme, if possible. This could involve examining the code that generates and processes the data, as well as any documentation or other resources related to the data format.

StevenLColeman42 commented 1 year ago

On Tue, Mar 28, 2023, 9:38 AM celso-vitor @.***> wrote:

5- If there is a specific range of values that does not require the full range of values they might be bit-packing the data to fit scaled values into a smaller space. Since there should be no negitive value in the readings there would be no reason for transmitting a sign bit for instance. Or there could be a smaller floating point representation without the added complexity of a sign bit.

An example of this might be using 7 bit (or smaller) integers that are scaled integer values that when adding in a baseline value derived from the calibration temperature to arrive at the actual integer values on the server side. When reading the payload one would need to unpack these bytes on the proper boundries othwise it would look like random numbers. If the payload is shorter than the expected number of bytes from the device readings this would be a clue.

I would recommend trying to gather more information about the data format and encryption scheme, if possible.

Or be thinking more like the designer of the protocol who needed to transmit this data from an embedded device to a server for the actual computations. The less the device has to do the better the device performance and the less the internal heat generated.

This could involve examining the code that generates and processes the data, as well as any documentation or other resources related to the data format.

It's been so long I do forget. Did any of the tear down references mention if this device had a JTAG interface? What microcontroller does it use? All the answers to our questions would then be knowable. Last I knew, RE specifically for interoperability reasons was legal in the US. On the other hand, given that there is a USB interface directly on the device there may be a different 'read' instruction to grab the executable image off the device making this whole exercise a lot easier. I recall there being a number of command codes that did not have any real explanation as to its purpose, and read/write to the device image command could be one of those. One could try fuzzing the command code interface to see what kind of blobs could then be read off of the device to see if reading the executable image might be one of them. Knowing the specific microcontroller used would be necessary to decode that blob into actual instructions.

kebasaa commented 1 year ago

@StevenLColeman42 @celso-vitor I will need help with this, it's not my area of expertise at all...

I have updated the Readme, it has all the information I could find (including the processor type etc.. I have also included some manuals for these chips in the folder documentation).

With regards to your other comments:

  1. Compression: The data sent by the SCiO (and the app, which merely converts bytes to a Base64 string) is always the same number of bytes. I doubt that it is compressed, unless compression will always produce the same length
  2. Length of data reported by the server: As far as I can tell the server always returns a JSON containing 331 floats.
  3. Different data types: I don't know how to analyse the byte data more. True, it may not be floats (probably isn't floats), but I've tried ints and doubles and come up empty. It is most likely not signed as negative values do not make sense
  4. Potential encryption: No clue what encryption could have been used. All the data I have available was extracted from the logs. Please use it and try to play around with it.
  5. Bit-packing: How would you detect bit-packing, and decode the data then?
  6. Firmware access: The USB interface and commands I have collected do allow to update the firmware through USB and/or BLE, by sending commands and firmware files to the device as bytes. I don't know how to download the firmware though in order to analyse it.

Data from the device: I may be repeating myself, but I invariably get the same data structures from the device and from the logs: