verida / data-connector-server

1 stars 2 forks source link

Rethink and refactor properties to avoid ambigous terms #107

Closed aurelticot closed 2 months ago

aurelticot commented 2 months ago

There are inconsistencies across the API regarding the terminologies and the property names, for instance the provider handler definition have an id but are elsewhere referenced as handlerName.

Here's the response of the /providers endpoint:

Screenshot 2024-09-12 at 10 42 43 AM

Here's the response of the /sync/status endpoint:

Screenshot 2024-09-12 at 10 40 20 AM

[!WARNING] Inconsistencies lead to a more complex understanding of the relations between the different elements and a more complex technical implementation when using the API, prone to errors.

Suggestions

Provider object

{
  // Provider definition

  "id": "google", // Use `id` rahter than `name`
  "label": "Google",

  // ...

  "options": {}, // I'd prefer `config` but no big deal, important is to be consistent across definitions and instances

  "handlers": [
    {
      // Handler definition

      "id": "gmail",
      "label": "Gmail",

      "options": [   // I'd prefer `config` but no big deal, important is to be consistent across definitions and instances
        {
          // Provider option definition

          "id": "backdate",
          "label": "Backdate history"

          // ...
        }
      ]
    }
  ]
}

Connection object (Verida record)

{
  // A connection instance

  "_id": "google:1234567890", // Verida record id, same as the connection id below
  "id": "google:google:1234567890", // actual id of the connection

  "providerId": "google", // id of the provider, reference the `id` of the Provider object

  "accountId": "1234567890", // Reference to which account from that provider is connected, less ambiguous than the current `providerId` which makes me think it would be "google"

  "status": "connected", // Rather than `syncStatus` as it's not only about the sync which ultimately is down to the handlers

  // ...

  "options": {},   // I'd prefer `config` but no big deal, important is to be consistent across definitions and instances

  "handlers": [
    {
      "handlerId": "gmail", // rather than the current `name`, especially as the handler definitions don't have a `name` anymore but an `id`

      "enabled": true,

      "options": [  // I'd prefer `config` but no big deal, important is to be consistent across definitions and instances
        {
          "backdate": "3-month"
        }
      ]
    }
  ]
}

ConnectionHandler object (Verida record)

{
  // A connection handler instance

  "_id": "google:1234567890:gmail", // Verida record id

  "providerId": "google", // id of the provider, reference the `id` of the Provider object, rather than the current `providerName`

  "accountId": "1234567890", // Reference to which account from that provider is connected, less ambiguous than the current `providerId` which makes me think it would be "google"

  "handlerId": "gmail", // Reference the `id` of the provider handler definition

  "status": "syncing",

  // ...

}

Endpoints

The /sync and the sync/status endpoints take a provider and providerId search params, which would be providerId and accountId instead

The /provider/disconnect/:provider endpoint takes a providerId search param which would be accountId.

Internally, it would also be good to update the variables and other references to match the new terminology. For instance, but not exhaustive, the /provider/disconnect/:provider endpoint would be /provider/disconnect/:providerId.


[!NOTE] It is understood that changing these properties is a significant refactoring (hopefully typescript will help spot missed updates, if typed correctly) and will requires updates of the schemas which will make the existing records invalid... But, it is better to do it now when it's not released and used, so we don't have to handle a migration later.

tahpot commented 2 months ago

(Providers, Provider Handlers, Provider Options and Provider Handler Options)

There are:

Consider the duo id and label. Avoid name which is ambiguous whether it's an id or label.

Will migrate to consistent name of:

Make a decision between options and config and stay consistent across defintions and instances. I personally prefer config.

These are intentionally different. options refers to the configurable options for a provider, whereas config refers to the actual config specified for those options.

aurelticot commented 2 months ago

(Providers, Provider Handlers, Provider Options and Provider Handler Options)

There are:

  • Providers (ie: Google)
  • Provider Handles (ie: Gmail)
  • Provider Options (ie: batchSize)
  • Provider Handler Options (ie: Gmail batchSize)
  • Connections (ie: Chris' Google)
  • Connection Config (ie: Chris' Google batchSize)
  • Connection Handler Config (ie: Chris' Google Gmail batchSize)

Yes. I didn't extend on it but my point was the Providers-related object are read objects defining (definitions) a given provider, an handler, an option, etc. and not related to a user, while the Connections-related are object instances for a given user's connection and are updatable (the config, the status, etc.).

But it's just a mental model to then highlight the consistency issues, I have no problemw with these object per se.

Consider the duo id and label. Avoid name which is ambiguous whether it's an id or label.

Will migrate to consistent name of:

  • providerId (was provider / providerName)
  • accountId (was providerId)
  • handlerId (was handler / handlerName)

Thanks

Make a decision between options and config and stay consistent across defintions and instances. I personally prefer config.

These are intentionally different. options refers to the configurable options for a provider, whereas config refers to the actual config specified for those options.

Could it be configOptions instead of options? So we see the link with config.

tahpot commented 2 months ago

Could it be configOptions instead of options? So we see the link with config.

Yep, ok.