Best practice for encrypting model properties

TomerSalton commented 5 years ago

Description / Steps to reproduce / Feature proposal

In my application, I have models with properties that need to be encrypted before saved in the database. I'm trying to decide what's the best practice to implement that use-case.

Current implementation

What I'm currently doing is set an attribute in the property itself: encrypted: true

In my repository, I override the relevant methods. e.g. create:

async create(entity: DataObject<MyModel>, options?: AnyObject): Promise<MyModel> {
  for (let propertyName in this.entityClass.definition.properties) {
    if (this.entityClass.definition.properties[propertyName]['encrypted']) {
      entity[propertyName] = encrypted(entity, propertyName)    
    }
  }
  return (await super.create(entity, options));
}

There are a couple of problems with this implementation.

This method is not generic, as if tomorrow I would like to implement another logic?
It requires to override a lot of methods in the repository.
It requires editing per repository meaning that the CLI only covers a small part of the creation process.

Is there any better implementation for this issue?

Acceptance criteria

[ ] A mechanism allowing Repository classes to execute custom code whenever the repository is trying to convert model instance into raw data to be stored and also from the raw data to model instance. This is basically an Operation Hook, and it should be implemented by DefaultCrudRepository. See #2095 (comment) for more details, the code snippet is cross-posted below.
[ ] A section in our documentation (e.g. in Key Concepts >> Repositories) explaining how to use these new mechanism.
[ ] A guide in our documentation showing how can applications implement property encryption/decryption.

Proposed implementation:

class DefaultCrudRepository<T, ID> /*...*/{
  protected async entityToData(entity: DataObject<T>, options?: Options): DataObject<T> {
    // "persist hook" - no-op by default
    return entity;
  }

  protected async dataToEntity(data: DataObject<T>, options?: Options): DataObject<T> {
    // "load hook" - no-op by default
    return this.toEntity(data);
  }

  async create(entity: DataObject<T>, options?: Options): Promise<T> {
    const data = await this.entityToData(entity, options);
    const model = await ensurePromise(this.modelClass.create(data, options));
    const result = await this.dataToEntity(model);
    return result;
  }

  // etc.
}

TomerSalton commented 5 years ago

Hey @bajtos, Could you help me with this question?

bajtos commented 5 years ago

I think this feature would be much easier to implement if we had load/persist hooks in LB4. The "load" hook could be implemented by overriding toEntity method in DefaultCrudRepository, see https://github.com/strongloop/loopback-next/blob/377fa084c3a59447176cf59a86c8159192b9a097/packages/repository/src/repositories/legacy-juggler-bridge.ts#L391-L393

I think we should add fromEntity method to DefaultCrudRepository and use it in all places that are passing entity instance as instance data.

Hopefully, such hooks will address your concern It requires to override a lot of methods in the repository.

As for the third concern, It requires editing per repository meaning that the CLI only covers a small part of the creation process:

Create your own Repository base class.
Inherit from DefaultCrudRepository and override methods as needed.
Change your per-model repositories to inherit from your new base Repository class instead of DefaultCrudRepository.

I am not sure if our CLI allows developers to pick a custom Repository class as the base. If it does not, then I think we should enhance it to do so.

This method is not generic, as if tomorrow I would like to implement another logic?

I am afraid I don't understand, could you please explain in more details?

TomerSalton commented 5 years ago

First of all, I agree with you. The hooks in LB3 is the optimal mechanism for this use-case. I see that this feature is not prioritized and will not be available any time soon though, right?

About the overriding and inheritance solution, it means that the CLI is not enough by its own. After every repository creation, I will have to perform another manual code-changing.

And the last part, I feel I didn't explain myself well. Let me clarify: The encryption part is relevant for each method -

For editing and creating it performs encryption.
For reading it performs decryption.

The thing is, in the repository, I need to override 9 methods (create, createAll, find, findone, findbyid, update, updateall, udpdatebyid, replacebyid);

While I only need to perform this action after any POST, PATCH (same logic) and GET. So if I could somehow perform extra logic as per HTTP request and not per repository function, I would achieve the same results, but with a thinner implementation.

It might seem a minor difference, but in case there will be a need to implement extra logic it adds up.

bajtos commented 5 years ago

First of all, I agree with you. The hooks in LB3 is the optimal mechanism for this use-case.

👍

I see that this feature is not prioritized and will not be available any time soon though, right?

Operation hooks are unfortunately not on our current roadmap for 2019 Q1, see https://github.com/strongloop/loopback-next/issues/1839

There are many areas where LB4 is behind LB3 in feature parity, see https://github.com/strongloop/loopback-next/issues/1920. We are regularly reviewing this list to decide what to work on next.

BUT! LoopBack is an open project and we encourage our community to contribute features they are interested in. As an author of Operation Hooks in LB2/LB3, I am happy to help you myself along the way if you decide to work on this feature.

About the overriding and inheritance solution, it means that the CLI is not enough by its own. After every repository creation, I will have to perform another manual code-changing.

In my view, this is a missing feature of LB4 CLI that should be (hopefully) easy to implement.

Few ideas to consider:

We already have code to discover repository classes, it's used by lb4 controller command.
On the other hand, repositories tied to a specific model are usually wrong candidates for base repository selection. Maybe we need a different mechanism allowing applications and 3rd-party extension to advertise what repository base classes they would like to contribute to lb4 repository prompt.
lb4 repository should allow the user to pick a custom base repository class. There should be a CLI argument (option/flag) too. This may be good enough for the initial implementation on its own, without any list of repositories to pick from.

And the last part, I feel I didn't explain myself well. Let me clarify: The encryption part is relevant for each method -

For editing and creating it performs encryption.

For reading it performs decryption. The thing is, in the repository, I need to override 9 methods (create, createAll, find, findone, findbyid, update, updateall, udpdatebyid, replacebyid);

Sure, I agree with you. The solution I described in my comment was a workaround based on what LB4 provides right now. It has more downsides besides what you wrote, for example when DefaultCrudRepository adds a new method, the custom repository providing encryption will not intercept this new method.

So what I would like to see instead:

(1) A mechanism allowing Repository classes to execute custom code whenever the repository is trying to convert model instance into raw data to be stored and also from the raw data to model instance. This is basically an Operation Hook, and it should be implemented by DefaultCrudRepository.

For example, we can leverage a protected repository method.

class DefaultCrudRepository<T, ID> /*...*/{
  protected async entityToData(entity: DataObject<T>, options?: Options): DataObject<T> {
    // "persist hook" - no-op by default
    return entity;
  }

  protected async dataToEntity(data: DataObject<T>, options?: Options): DataObject<T> {
    // "load hook" - no-op by default
    return this.toEntity(data);
  }

  async create(entity: DataObject<T>, options?: Options): Promise<T> {
    const data = await this.entityToData(entity, options);
    const model = await ensurePromise(this.modelClass.create(data, options));
    const result = await this.dataToEntity(model);
    return result;
  }

  // etc.
}

(2) A convention describing how 3rd-party extensions should contribute different hook implementations. We can leverage class inheritance but also mixins.

An example of a mixin-based approach:

export EncryptedPropertiesMixin(Repository: typeof DefaultCrudRepository) {
  return class extends Repository {
    protected async entityToData(entity: DataObject<T>, options?: Options): DataObject<T> {
      const data = super.entityToData(entity, options);
      // encrypt selected properties
    }

    protected async dataToEntity(data: DataObject<T>, options?: Options): DataObject<T> {
     // decrypt selected properties in data
     return this.toEntity(data);
    }
  };
}

(I am not sure if protected methods can be used with mixins, I vaguely remember there was some TypeScript limitation related to this.)

(3) Integration with lb4 repository as discussed earlier in my comment.

While I only need to perform this action after any POST, PATCH (same logic) and GET. So if I could somehow perform extra logic as per HTTP request and not per repository function, I would achieve the same results, but with a thinner implementation.

It might seem a minor difference, but in case there will be a need to implement extra logic it adds up.

In my mind, this approach is solving the problem at a wrong place. Encryption/decryption needs to happen at data access level, to ensure that properties are correctly encrypted/decrypted when changing data from TypeScript code. For example, when seeding the database with sample data from a test setup, there is no HTTP request and no HTTP verb like POST/GET.

@strongloop/loopback-next what are your thoughts on Operation hooks, especially the pair "load" and "persist"?

TomerSalton commented 5 years ago

@bajtos since the custom repository is a good solution (until hooks will be implemented), I guess we can close this issue, right?

edit - In addition, do you have any issues/conclusions/news about operation hooks? I might be interested in contributing.

bajtos commented 5 years ago

About the overriding and inheritance solution, it means that the CLI is not enough by its own. After every repository creation, I will have to perform another manual code-changing.

lb4 repository should allow the user to pick a custom base repository class. There should be a CLI argument (option/flag) too.

This should be available soon, see https://github.com/strongloop/loopback-next/pull/2235

since the custom repository is a good solution (until hooks will be implemented), I guess we can close this issue, right?

Not really. While you can implement encryption/decryption in a custom repository, it would mean overriding every persistence method and possibly missing newly added methods in the future.

I'd prefer to keep this issue open at least until we have the following mechanism in place & have an official documentation describing how to implement property encryption.

A mechanism allowing Repository classes to execute custom code whenever the repository is trying to convert model instance into raw data to be stored and also from the raw data to model instance. This is basically an Operation Hook, and it should be implemented by DefaultCrudRepository.

See https://github.com/strongloop/loopback-next/issues/2095#issuecomment-446182948 for more details and a code snippet.

bajtos commented 5 years ago

@TomerSalton are you still interested in this feature? Would you like to contribute the improvement of our DefaultCrudRepository yourself? See https://loopback.io/doc/en/contrib/ to get you started, I am happy to help you along the way.

TomerSalton commented 5 years ago

Hey @bajtos , as mentioned here - I'm satisfied with this solution for now.

I am already working on another issue (hasAndBelongsToMany). Since I am willing to keep on contributing, I might refer to this one on the future :)

bajtos commented 5 years ago

as mentioned here - I'm satisfied with this solution for now.

I am glad to hear that.

I'd still prefer to keep this issue open so that we don't forget to make property encryption feel more like a first-class citizen. It's also a good opportunity for people looking for places where to start contributing.

Good luck with your work on hasAndBelongsToMany 🤞

loopbackio / loopback-next