sensebox / openSenseMap-API

API for opensensemap.org
https://docs.opensensemap.org/
MIT License
49 stars 41 forks source link

Authentication on Measurement Upload #158

Open nobodyinperson opened 6 years ago

nobodyinperson commented 6 years ago

When uploading new measurement data via the API HTTP POST method (according to the docs at https://docs.opensensemap.org/#api-Measurements-postNewMeasurement) there is no authentication that authorizes the uploader to do so. At least for my custom test box that I created via the openSenseMap web interface this results in anyone being able to upload arbitrary measurement data to my box via a simple

curl \
    -H "Content-Type: application/json" \
    -d '{"value":"3333"}' \
    https://api.opensensemap.org/boxes/$SENSEBOX_ID/$SENSOR_ID

where $SENSEBOX_ID and $SENSOR_ID are publicly available, e.g. via the openSenseMap web interface.

While of course the whole openSenseMap project is based on nice people collaborating and feeding an awesome measurement network together, "trust is good, control is better", I would say. :-)

Expected Behavior

The new Measurement HTTP POST API method should require a mean of authentication.

Current Behavior

The new Measurement HTTP POST API method does not require a mean of authentication.

Possible solution

I understand that introducing an authentication to the upload mechanism increases the complexity of the sensor software. A solution could be to add an option for upload only with authentication to the senseBox which forces the upload to happen with authentication, e.g. with the standard header-based authentication with the JSON Web Token. Maybe even via another API method.

Your Environment

noerw commented 6 years ago

Afaik the reason this was not added yet is because this would break compatibility with all already deployed boxes. I personally like the idea of a flag to circumvent this. It would be set by default for new boxes, and could be set for existing boxes by users (or would be set automatically when updating the sketch).

nobodyinperson commented 6 years ago

I just realized that the Arduinos use http://ingress.opensensemap.org which is unencrypted HTTP for upload (because plain Arduinos are too weak for SSL I assume). Unencrypted authentication is rather a joke but still a bigger obstacle for malicious data upload that having none.

nobodyinperson commented 6 years ago

Still, the Arduino sketch would need to include the sign-in/token refreshing process which adds a little boilerplate. Should be manageble though.

ubergesundheit commented 6 years ago

Hi @nobodyinperson,

thanks for opening this issue! Your observations are correct and there is no authentication or authorization present when uploading new values.

As you've already guessed, introducing a means of authorization and authentication of uploading stations means really big changes to the current architecture on both server and stations. In fact, we are actively looking for a good solution for making sure, measurements come from the right source.

Before we look at possible solutions, here are some requirements the solution should bring:

Some solutions that come to mind:

A simple shared secret between device and server (static API token or credential pair for HTTP Basic authentication) which is sent with each request. Upon registering a new station, the server generates a simple API token which is given to the station. Pro: Really easy to implement Cons: Without transport encryption, this secret is easyly readable for everyone looking at TCP traffc between station and server. Our current stations only support HTTP, so this solution is easily broken.

Hashing of the payload. Upon registering a station, a secret key is generated and given to the station. Each set of measurements is hashed using a cryptographic hash function with the secret key. The hash is sent along with the measurements and is used to make sure the station is authorized to send measurements. Some limitations come to mind: Since sensor ids, the acutal measurements and the formatting and encoding of requests are public, using a long secret is a must to prevent brute force guessing of the secret. A longer secrets means higher computational cost. A solution for this could be to use a time-based one-time password as secret key, but here the internal clocks of the server and stations must be synchronized to work. This requires the device and server to use network time. (Already present on the server). Pro: If implemented right, should be reasonably secure, even without transport encryption since the secret never leaves the device. Cons: Hard to get right. High computational cost.

So adding opt-in authentication should be the right way to go forward, but I think the decision which method to implement should'nt be rushed. The chosen solution should work for everyone and should be easy to implement for everyone. We recently announced our new board which should pack enough computational power to make something possible.

Maybe you have another good idea or some thoughts?

The endpoint ingress.opensensemap.org also supports https.

nobodyinperson commented 6 years ago

Thanks for your detailed reply, @ubergesundheit !

This PSK (pre shared key) method you are suggesting sounds very promising for this purpose. I think what you described there is commonly known as salting.

  • No plain text transport of the authentication means => TLS/HTTPS

I think this requirement can be neglected for the hashing approach.

Cons: Hard to get right. High computational cost.

There seems to be at least a simple MD5 hashing library for the Arduino (https://github.com/tzikis/ArduinoMD5/). MD5-hashing strings of around 1k bytes length seems to be pretty fast enough on an Arduino Uno. SHA1 might be a little more computationally expensive but should still work. I didn't find a SHA1 library for the Arduino though...

Maybe it isn't necessary to use the whole payload for hashing, say only the first 1k bytes. Together with a 40-byte SHA or base64-encoded PSK it seems resonably safe to me. This is, after all, just a measure of preventing spam or unauthorized data upload, not traffic encryption.

poempelfox commented 6 years ago

To solve the main problem of "everybody can send fake data for other peoples sensors" and leaving aside transport security which I'd consider far less of a problem:

Couldn't you simply introduce new, additional "public" IDs for senseboxes and sensors? The current ones would then be renamed to "private" IDs. The API would accept either the public or private IDs for all read-only calls, but only the private IDs for everything that changes things, e.g. posting new data. The public part of the website would simply switch to using the public IDs, no longer exposing the private IDs. That approach would certainly require changes server-side for API and webinterface, but no changes at all for the clients sending data.

You could perhaps just use something like the SHA512-sum of the private ID as the public ID, so everyone knowing the private IDs could generate the matching public ID themselves without talking to your database.

nobodyinperson commented 6 years ago

Hi @poempelfox,

You are effectively suggesting to drop the salting mechanism and just send some kind of PSK (your private key) along with the data (and the sensor ID - your public key), right?

If we plan to implement an authentication method to prevent people from uploading bogus to other people's sensors, that method should be solid enough. The Arduinos cannot encrypt their HTTP traffic. So anybody in the same network can easily scrap anything it sends through the network - including the "private" key. The PSK hash salting approach prevents exactly that although the traffic is unencrypted.

poempelfox commented 6 years ago

I'm not suggesting to "drop" anything in the long run, I'm suggesting to solve one problem at a time. The main problem for me is not that people could listen to the unencrypted transmission of my data, it is the sad fact that right now everybody on the whole internet can easily send fake data for my sensors. And that is also the title of this issue, isn't it?

And to solve that particular problem, you really don't need to introduce methods that will break all existing devices/clients out there that are currently sending data. Instead you just need to consider both $SENSEBOX_ID and $SENSOR_ID to be non-public data and stop publishing them ASAP. I mean I haven't looked at how they are generated, but they seem to be long and almost impossible to guess strings. Introduce and assign new $PUBLIC_SENSEBOX_ID and $PUBLIC_SENSOR_IDs for all existing senseboxes and sensors. Make the API accept either the $PUBLIC_SENS_ID or $SENSID for all calls that only read data. Change the webinterface to also only reference the $PUBLICIDs (except for logged in users when editing their own boxes/sensors of course). Everything that's published should only show the $PUBLIC_IDs. If the $SENS*_IDs are no longer trivially visible to everyone on the planet, they should be able to offer security against random trolling, as they should not be guessable. And again, that would work without any change to any senseboxes that are currently deployed in the field. And it would raise the bar for sending fake sensordata quite a bit - from "every random troll on the whole internet, without any effort" to "people able to sniff my traffic".

This doesn't mean you could not still implement some better security for future sensor deployments, you probably should - but I'd consider that to be a far less pressing problem. As nobodyinperson wrote: "Unencrypted authentication is rather a joke but still a bigger obstacle for malicious data upload than having none."

nobodyinperson commented 5 years ago

In another (currently private) project on GitLab.com I have tested the salting mechanism to prevent malicious uploads. It works very well. The Arduino MD5-library works perfectly fine to hash the HTTP salted payload. The server can check that hash against its own salts and only accept the measurement if the hashes match.

mpfeil commented 5 years ago

@nobodyinperson thanks for the feedback. Could you provide a sample for us?

nobodyinperson commented 5 years ago

When the project is ready, I will make it public so you can access it too.

nobodyinperson commented 5 years ago

Here is our project python3-co2logserver: https://gitlab.com/tue-umphy/co2mofetten/python3-co2logserver

Here is an excerpt from the README:

Authentication

If you want to control who is allowed to upload data to the server, you may use the PSK (pre-shared-key) salting mechanism built into the server.

Set CO2LOGSERVER_UPLOAD_REQUIRES_AUTH=True and specify one or more PSK salt strings, e.g. CO2LOGSERVER_CHECKSUM_SALTS = ["my-super-secret-psk"].

By default, the server then only accepts requests including at least one header field Content-HASHALGORITHM-Salted containing the hexadecimal hash of the sent payload with the salt appended calculated with HASHALGORITHM (e.g. MD5, SHA1, SHA256, etc...).

For example, if you want to upload the JSON data {"time_utc":[43,23],"co2":[1223,2351]} and your salt string is my-super-secret-psk, your header field Content-MD5-Salted would be b71e91feb2be18ccca019914a1da5b1d which is the MD5-sum of {"time_utc":[43,23],"co2":[1223,2351]}my-super-secret-psk.

This is a simple yet effective way of preventing spam uploads.

Security Note

Note, however, that communication to the server is still unencrypted (only HTTP, not HTTPS). The reason for this is that embedded devices like Arduinos do not have the capabilities for encrypted web traffic. Thus, the sent data including the checksums can theoretically be intercepted and reused to reupload the exact same dataset.

The interesting code parts lives here.

On the Arduino side, the Arduino MD5 library is used to hash the payload.

It works perfectly fine.

nobodyinperson commented 5 years ago

I just made the LogserverClient Ardunino Library public. You can find the function for calculating the salted MD5 hash here. It uses my fork of the ArduinoMD5 library which fixes heap fragmentation by avoiding dynamic memory allocation. It works perfectly fine for a basic, unencrypted authentication based on a pre-shared key.

noerw commented 5 years ago

@nobodyinperson this looks great! Do I understand correctly that the hash salted with the secret is sent in the Content-MD5-Salted HTTP header? (never mind, I just read your previous comment again)