ormushq / ormus

Ormus - Customer Data Platform
MIT License
28 stars 19 forks source link

Creating Write Key #20

Closed AmirAtashghah closed 6 months ago

AmirAtashghah commented 11 months ago

Issue: we need to currently working on implementing a write key functionality. The write key should include information such as source, destination, and potentially other details. we need to find best practices how to effectively design and implement this feature.

Details:

Questions:

  1. What information does the write key in Ormus typically contain?
  2. How should the write key be encrypted to ensure security?
  3. How can I ensure the uniqueness and security of the generated write keys in the context of Ormus?
DOO-DEV commented 11 months ago

In segment as far as i tested for two golang sources, write keys are completely different and more important they not depend on destination because at start we don't have any destination but we have the write key from start.

By the way i think segment use KSUID : https://github.com/segmentio/ksuid

AmirAtashghah commented 11 months ago

So it is not necessary for write keys to have special information inside?

DOO-DEV commented 11 months ago

I don't think so

abtinokhovat commented 11 months ago

In segment as far as i tested for two golang sources, write keys are completely different and more important they not depend on destination because at start we don't have any destination but we have the write key from start.

By the way i think segment use KSUID : segmentio/ksuid

So a write key is connected to for example a user id, not a source id, correct?

DOO-DEV commented 11 months ago

write key generate per each source and each user can have multiple source aka multiple write keys. if i get it right you are saying saving token can give us sort ability? ksuid also provide this feature

abtinokhovat commented 11 months ago
  1. So from what you say If you create an account in Segment a write key would be generated for you? and you can use it in any source you created? this a little bit weird for me, I think generating write key, per source is more logical, IDK :)).

  2. Think about writekey validation, One of our jobs is to validate the generate writekeys. User cannot give a random string, and we store it as a writekey. So we should store every generated writekey in order of validation, correct?

  3. And lastly from what I remember for optimizing our databases (Source DB) we may have to shard our database by writekeys, so I think it is better to have sortable tokens for further needs. And yes I've read the ksuid docs and saw that it is sortable.

gohossein commented 11 months ago

As far as I know, each write-key is unique for a single source and it isn't related to the user authentication token. Write-key can be any random enough long-length string that should be unique. I am only concerned about the uniqueness-guarantee, especially in a distributed system that would be hard.

gohossein commented 11 months ago

In segment as far as i tested for two golang sources, write keys are completely different and more important they not depend on destination because at start we don't have any destination but we have the write key from start.

By the way i think segment use KSUID : https://github.com/segmentio/ksuid

This is great option or at least great inspiration...

abtinokhovat commented 10 months ago

Hi again, I've researched about ULIDs and founded this library ULID-oklog. Also I tried to decode the write-keys that I have obtained from segment.

(It is a little bit weird for me that the lengths are not equal)

But I could not decode it with neither a base-32 decoder (which ULID is based on) and nor the KSUID I even tried to divide tokens and extract the time stamp from the token.

Why?

I dug dip into both libraries KSUID and ULID both of them have some similarities in core concepts.

KSUID

> ksuid
2YzRR195l16ngj9gYXmTP1iM0Td
> ksuid -f inspect 2YzRR195l16ngj9gYXmTP1iM0Td

REPRESENTATION:

  String: 2YzRR195l16ngj9gYXmTP1iM0Td
     Raw: 11F8ED894A9979D35F602EF0482811D6EA770CDD

COMPONENTS:

       Time: 2023-12-02 17:43:29 +0330 +0330
  Timestamp: 301526409
    Payload: 4A9979D35F602EF0482811D6EA770CDD

ULID

> ulid
01HGNEMMNVK915T1339P2FYH34
> ulid 01HGNEMMNVK915T1339P2FYH34
Sat Dec 02 14:24:08.891 UTC 2023

So to conclude, segment write-key is neither a ULID nor a KSUID, looking at their format I can say they are not GUIDs or UUIDs, or at least, segment is not sending the raw ULID or KSUID to the user for using as a write-key.

Questions:

  1. Should we send which ever key that we generated(and stored in a database) to the user? Or should some how change the format like segment?
  2. Do we need to embed user-id or something user data related to the ULID? Why?
    • Scenario 1: we use raw ULIDs as write-key and send it to user, when validating we should
      1. get write-key from user
      2. send to our manager service which is responsible for generating and validating write-key
      3. manager validates it
      4. finally send a response to user, that if the token is okay or not?
gohossein commented 10 months ago

Hi again, I've researched about ULIDs and founded this library ULID-oklog. Also I tried to decode the write-keys that I have obtained from segment.

  • jkg2bSc6UjDQjnoJupOd6Test7dAgdJv (go library)
  • IXn7WzRb84ZHYWHWVtybxnA6wH7ZP0ng (JS library)

(It is a little bit weird for me that the lengths are not equal)

But I could not decode it with neither a base-32 decoder (which ULID is based on) and nor the KSUID I even tried to divide tokens and extract the time stamp from the token.

Why?

I dug dip into both libraries KSUID and ULID both of them have some similarities in core concepts.

  • ULIDs are Ids that are constructed by a 48 bit timestamp and 80bit random data.
  • KSUIDs, as well have a similar structure with 32 bit timestamp (which by looking at the length I can say it is not nanosecond accurate like ULID) and 128bit random data.
  • also ULIDs are base32, meaning they are constructed by 28 capital letters of english and numbers 2-7, which on the other hand, KSUIDs are base62.
  • both of them are sortable.

KSUID

> ksuid
2YzRR195l16ngj9gYXmTP1iM0Td
> ksuid -f inspect 2YzRR195l16ngj9gYXmTP1iM0Td

REPRESENTATION:

  String: 2YzRR195l16ngj9gYXmTP1iM0Td
     Raw: 11F8ED894A9979D35F602EF0482811D6EA770CDD

COMPONENTS:

       Time: 2023-12-02 17:43:29 +0330 +0330
  Timestamp: 301526409
    Payload: 4A9979D35F602EF0482811D6EA770CDD

ULID

> ulid
01HGNEMMNVK915T1339P2FYH34
> ulid 01HGNEMMNVK915T1339P2FYH34
Sat Dec 02 14:24:08.891 UTC 2023

So to conclude, segment write-key is neither a ULID nor a KSUID, looking at their format I can say they are not GUIDs or UUIDs, or at least, segment is not sending the raw ULID or KSUID to the user for using as a write-key.

Questions:

  1. Should we send which ever key that we generated(and stored in a database) to the user? Or should some how change the format like segment?
  2. Do we need to embed user-id or something user data related to the ULID? Why?
  • Scenario 1: we use raw ULIDs as write-key and send it to user, when validating we should

    1. get write-key from user
    2. send to our manager service which is responsible for generating and validating write-key
    3. manager validates it
    4. finally send a response to user, that if the token is okay or not?
  • Scenario 2: we extend ULIDs length, embedding a crypto-graphed secret which we declared, and a user-id maybe, so we could decode it, instead of requesting the token status from manager service.

Thanks, Abtin jan for your great discovery. I guess we can use a customized algorithm to generate write-key if we want to consider security concerns. But we may lose the database performance gain for ordered IDs.

abtinokhovat commented 10 months ago

Hi again, I've researched about ULIDs and founded this library ULID-oklog. Also I tried to decode the write-keys that I have obtained from segment.

  • jkg2bSc6UjDQjnoJupOd6Test7dAgdJv (go library)

  • IXn7WzRb84ZHYWHWVtybxnA6wH7ZP0ng (JS library)

(It is a little bit weird for me that the lengths are not equal)

But I could not decode it with neither a base-32 decoder (which ULID is based on) and nor the KSUID I even tried to divide tokens and extract the time stamp from the token.

Why?

I dug dip into both libraries KSUID and ULID both of them have some similarities in core concepts.

  • ULIDs are Ids that are constructed by a 48 bit timestamp and 80bit random data.

  • KSUIDs, as well have a similar structure with 32 bit timestamp (which by looking at the length I can say it is not nanosecond accurate like ULID) and 128bit random data.

  • also ULIDs are base32, meaning they are constructed by 28 capital letters of english and numbers 2-7, which on the other hand, KSUIDs are base62.

  • both of them are sortable.

KSUID


> ksuid

2YzRR195l16ngj9gYXmTP1iM0Td

> ksuid -f inspect 2YzRR195l16ngj9gYXmTP1iM0Td

REPRESENTATION:

  String: 2YzRR195l16ngj9gYXmTP1iM0Td

     Raw: 11F8ED894A9979D35F602EF0482811D6EA770CDD

COMPONENTS:

       Time: 2023-12-02 17:43:29 +0330 +0330

  Timestamp: 301526409

    Payload: 4A9979D35F602EF0482811D6EA770CDD

ULID


> ulid

01HGNEMMNVK915T1339P2FYH34

> ulid 01HGNEMMNVK915T1339P2FYH34

Sat Dec 02 14:24:08.891 UTC 2023

So to conclude, segment write-key is neither a ULID nor a KSUID, looking at their format I can say they are not GUIDs or UUIDs, or at least, segment is not sending the raw ULID or KSUID to the user for using as a write-key.

Questions:

  1. Should we send which ever key that we generated(and stored in a database) to the user? Or should some how change the format like segment?

  2. Do we need to embed user-id or something user data related to the ULID? Why?

  • Scenario 1: we use raw ULIDs as write-key and send it to user, when validating we should

    1. get write-key from user

    2. send to our manager service which is responsible for generating and validating write-key

    3. manager validates it

    4. finally send a response to user, that if the token is okay or not?

  • Scenario 2: we extend ULIDs length, embedding a crypto-graphed secret which we declared, and a user-id maybe, so we could decode it, instead of requesting the token status from manager service.

Thanks, Abtin jan for your great discovery. I guess we can use a customized algorithm to generate write-key if we want to consider security concerns. But we may lose the database performance gain for ordered IDs.

🙏🏻 I thaught about that, we can add 16 bits more in format of MD5 hash (secret + user_id + source_id (if needed)).

ULID + MD5

With this solution we can both have sorting functionality with ULID, and secure data protection for users, because we just need verificarion for the token not authorization.

But I dont know if MD5 is secure enough for this purpose, and other hashing algorithms like SHA family were too long I think.

gohossein commented 10 months ago

MD5 is not secure enough.