spring-projects / spring-data-couchbase

Provides support to increase developer productivity in Java when using Couchbase. Uses familiar Spring concepts such as a template classes for core API usage and lightweight repository style data access.
https://spring.io/projects/spring-data-couchbase
Apache License 2.0
275 stars 191 forks source link

SimpleCouchbaseRepository.findById() and CouchbaseTemplate.findById() do not validate the document type #1067

Closed aaronjwhiteside closed 3 years ago

aaronjwhiteside commented 3 years ago

It is possible to save() a document of one type, and retrieve it using another type.

The below example will succeed, where one might expect it to fail.

public class Group {
    @Id
    @GeneratedValue(strategy = GenerationStrategy.UNIQUE)
    String id;
    @Field
    String name;
}

@Repository
public interface GroupRepository extends CouchbaseRepository<Group, String> {
}

public class User {
    @Id
    @GeneratedValue(strategy = GenerationStrategy.UNIQUE)
    String id;
    @Field
    String name;
}

@Repository
public interface UserRepository extends CouchbaseRepository<User, String> {
}

@SpringBootApplication
@EnableCouchbaseRepositories
public class ExampleApplication implements ApplicationRunner {

    @Autowired
    private UserRepository userRepository;

    @Autowired
    private GroupRepository groupRepository;

    public static void main(final String[] args) {
        SpringApplication.run(ExampleApplication.class, args);
    }

    @Override
    public void run(final ApplicationArguments args) throws Exception {
        User user = userRepository.save(new User());
        System.out.println(user.id);
        System.out.println(groupRepository.findById(user.id)
                .get().id);
    }
}

The underlying code in SimpleCouchbaseRepository#findById

calls

return Optional.ofNullable(couchbaseOperations.findById(entityInformation.getJavaType()).one(id.toString()));

CouchbaseTemplate should at least try and validate that the supplied type matches the retrieved document type, and either return null (my personal preference), or throw an exception that the underlying type does not match (still acceptable).

We had implemented a workaround for the previous Spring Data Couchbase 3.x series, that prevented this behaviour as it was flagged by our qa/security team as a security issue.

It's probably just a good idea to solve it at the source and for everyone?

mikereiche commented 3 years ago

This would prevent an application from having Employee documents read as either Manager entities or IndividualContributor entities. The current behavior is the equivalent to the "as(OtherEntity.class)" projection on the query api.

aaronjwhiteside commented 3 years ago

I think that the casting should be explicit, if you want to read an Employee document as a Manager document, then it should be indicated as such.

template.findById(Employee.class).as(Manager.class).one("abc123");

Otherwise it's a security risk..

And without overriding the default repository impl and or CouchbaseOperations implementation, it's not really possible to solve. Because the type information is silently discarded.

This is not stated as a warning in the documentation, that out of the box anyone who happens to know the id of a document can request it via different repositories/entities classes, for microservices this typically means different services... I can take an id of a document from one microservice and request it from another and out of the box it will work.. and might expose sensitive information.

More mature ORMs prevent this, or make it explicit.

If you don't want to change the default to prevent this then I implore you to add the ability to turn it on or control it.. via code or configuration.

mikereiche commented 3 years ago

I can take an id of a document from one microservice and request it from another and out of the box it will work.. and might expose sensitive information.

_class is not a security compartment. Security is on buckets. If a microservice is going to rely on _class for security, then it will need to enforce it.

More importantly, it would be a breaking change. Edit: This issue was discussed in https://github.com/spring-projects/spring-data-couchbase/issues/409 and https://github.com/spring-projects/spring-data-couchbase/issues/668

aaronjwhiteside commented 3 years ago

_class is not a security compartment

You're right it's not directly related to security, but it's used to enforce type safety. And bugs with type safety can lead to exposing sensitive information. So indirectly I assert that it IS a security concern.

Java is generally considered a type safe language (there are limits and arguments to be made) but it's not javascript or ruby.

At the Java level when defining and using a Repository there is an explicit contract on the types that the repository will accept and produce.

@Repository
public interface GroupRepository extends CouchbaseRepository<Group, String> {
}

GroupRepository#save() will only compile if Group types are passed in.

And Repository#findById() will only return instances of Group.

However it seems that although type information is persisted and maintained by save() it is ignored by findById(). Strangely enough if findAll() where to return documents of a different type, this would be considered a bug.. after all the explicit type contract was neglected, was it not?

In the context of a Repository findById() is the outlier that does not enforce type safety. Why is it special?

To be thorough I think Repository#deleteById(), #deleteAllById, #existsById also suffers from the same problem.

Security is on buckets.

Yes it is.

I also remember Couchbase Professional Services recommending that for micro services we have them share a bucket. Due to scaling limitations of Couchbase, If I remember correctly it performs really badly with more than 10 buckets. And we have 60 odd microservices at last count.

We do have a seperate security bucket for really sensitive information. So there is that.

But even within a single micro service it can and does in our case, contain multiple document types. Represented as distinct resources via a REST API.

And I agree with you that the micro service probably needs to enforce this, not Couchbase itself. I'm not arguing that it be handled by Couchbase for me. I'm just pointing out that Spring Data Couchbase is intended to be a high level interface to Couchbase which provides specific and different (from the raw SDK) paradigms for accessing it. I think the responsibility lies here.

This would prevent an application from having Employee documents read as either Manager entities or IndividualContributor entities.

I'm not discounting this use case, but I am saying I believe it's definitely not valid in the context of a Repository.

And I think we probably crossover into talking about polymorphism and how to represent that for Couchbase documents, in terms of ORMs JPA and Hibernate are probably a good base line here.

And to reenforce the argument, if one does not want type safety and wants to duck type documents, then they probably shouldn't be Spring Data Couchbase? As it's design makes the document types explicit unlike the underlying Couchbase SDK.

The current behavior is the equivalent to the "as(OtherEntity.class)" projection on the query api.

I assume you mean GetResult#contentAs(OtherEntity.class)?

So CouchbaseTemplate has no specific generic type, and always requires the user to pass in the type they want returned. This makes sense, it's at a lower level than the SimpleCouchbaseRepository implementation.

Given it does accept any class type in the insertById() method, and persist that class type into a field in the raw document. And there is a natural assumed symmetry between insertById() and findById().

I believe it should also allow one to enforce the type of the underlying documents. Maybe that's by a configuration flag, or by a new explicit type safe method, typeSafeFindById() or perhaps findById(Some.class).verifyType().one("123").

If a microservice is going to rely _class for security, then it will need to enforce it.

But how?

By the time CouchbaseTemplate#findById() has returned, the type information is gone.. lost.

And if your answer is going to be, well you can just add another query method to your repository.. as a work around. Right sure, so for all the ById methods in the CouchbaseRepository interface, I'll re-implement them because the out of the box methods have broken the explicit type contract as defined in the repository class definition...

I'm preemptively saying I think this argument is weak at best.

I believe that CouchbaseTemplate needs to provide this functionality out of the box.

mikereiche commented 3 years ago
Edit:

>>If a microservice is going to rely _class for security, then it will need to enforce it.

> But how?
> By the time CouchbaseTemplate#findById() has returned, the type information is gone.. lost.

Add a _class property to the entity class, the value of _class from the document will be returned in it.

In the context of a Repository findById() is the outlier that does not enforce type safety. Why is it special? To be thorough I think Repository#deleteById(), #deleteAllById, #existsById also suffers from the same problem.

For the deletes and exists, this requires an additional access to the _class property to check that it matches. Also for findById() that uses a projection that does not include the _class.

The normal method of compartmentalizing with the key is to use @IdPrefix for the index. If a client passes a key like "airport:123" to fetch a city object, the microservice rejects the request.

You're right it's not directly related to security, but it's used to enforce type safety. And bugs with type safety can lead to exposing sensitive information. So indirectly I assert that it IS a security concern.

The argument that some bugs can be security bugs doesn't explain why this behavior is a security bug. I'm looking for an answer to "How can data be accessed with this behavior that cannot be accessed without it?" The only thing I can come up with is "If a microservice assumes that an id belongs to a document of an authorized type, when in fact it belongs to an a document of an unauthorized type". The solution to that would be to use an @IdPrefix.

mikereiche commented 3 years ago

see also https://github.com/spring-projects/spring-data-couchbase/issues/409 and https://github.com/spring-projects/spring-data-couchbase/issues/668

aaronjwhiteside commented 3 years ago

The normal method of compartmentalizing with the key is to use @Prefix for the index. If a client passes a key like "airport:123" to fetch a city object, the microservice rejects the request.

Yeah, we use this too.

Our prefix is <service name>::<document type>::.

Last I checked Spring Data Couchbase (3.x) only provided support for a static prefix, so we have been generating our IDs manually. I have been meaning to open another issue about this to see if some portion of the key can be global and another be specific to the @Document.

But this just pushes the problem up into the micro service itself.

When Spring Data Couchbase was created to handle these idiosyncrasies and provide a convenient and consistent user experience?

If the persistence layer due to its nature cannot effectively enforce document level security, and we have a layer that provides a type safe interface to the raw underlying database.. why should the service itself take on this responsibility? And duplicate what potentially a lot of users realistically need to use Couchbase with microservices in a production environment?

The argument that some bugs can be security bugs doesn't explain why this behavior is a security bug. I'm looking for an answer to "How can data be accessed with this behavior that cannot be accessed without it?" The only thing I can come up with is "If a microservice assumes that an id belongs to a document of an authorized type, when in fact it belongs to an a document of an unauthorized type". The solution to that would be to use an @Prefix.

Again you're saying it's not a Spring Data Couchbase problem, even when it provides the illusion of type safety, but ultimately it's the applications responsibility to enforce this.

But it's not documented anywhere... so the assumption that I believe most people will come away from when looking at Spring Data Couchbase is that it's handled transparently.

I ask, why bother to provide just the illusion of type safety when you can provide the real thing?

see also #409

As you probably guessed I'm not buying the answers given here.

Quoted from the issue:

after looking into this with the project lead, it appears that this is a by-design limitation, as stricter checks would break some features like paginated queries...

Paginated queries are n1ql queries, and always provide the concrete type from the Repository interface..

Maybe this was true when things were backed by views?

To summarize my ask:

  1. Repository classes honor the declared type for all methods. (I really can't imagine this break many projects and the solution is simple if it does, use the CouchbaseTemplate instead of the type safe Repository). I'd be happy if this too was opt in only. Via configuration or annotation, maybe @ValidateDocumentType.
  2. Support be added to CouchbaseTemplate to enforce type safety. (I'm quite fond of the idea template.findById(SomeEntity.class).validateType().one("123")). I imagine that this functionality will be used by the SimpleCouchbaseRepository implementation for strict type safety.
aaronjwhiteside commented 3 years ago

For the deletes

delete from `bucket` using keys 'abc123' where `_class` = 'my.class'

and exists, this requires an additional access to the _class property to check that it matches.

fetch a sub document containing only the _class property. (SubdocGetRequest) or

select `_class` from `bucket` using keys 'abc123'

This is how we worked around the issue in Spring Data Couchbase 3.x.

If this is opt in, then it shouldn't affect people who don't want it (whomever those crazy cowboys may be)

mikereiche commented 3 years ago

Our prefix is<service name>::<document type>::.

The microservices can choose to honor or deny the request based on the document type.

There are performance penalties for accessing properties. Especially if the query api is used.

mikereiche commented 3 years ago

I don't see a problem with implementing validType() on the template and adding a repository annotation that would cause it to be used. For Get operations that retrieve the whole document, it would compare the value of _class with the entity type.

I don't know about delete and exists, though. In my experience the query api is much slower. And you already have the key that indicates the document type.

aaronjwhiteside commented 3 years ago

The microservices can choose to honor or deny the request based on the document type.

We're back to arguing what the responsibility of something like Spring Data Couchbase should be, given the limitations of Couchbase self, and given the type of APIs Spring Data Couchbase it encourages/exposes via the Repository interface, and if type safety is just an illusion/construct of Java or if Spring Data Couchbase should honor the contract defined by the Repository class definition.

I don't know about delete and exists, though. In my experience the query api is much slower. And you already have the key that indicates the document type.

Agree the query API is much slower, but given that we're already going down the rabbit hole of opt-in functionality, maybe an additional level of opting in.. maybe@ValidateDocumentType(validateDeletes=true, validateExists=true)?

But honestly if we start validating findById() then removeById() and co should mirror the same functionality? If you want one, I bet you want the others too. In our case it wouldn't make sense to remove the workaround code just for findById() only to leave it in place for removeById() and existsById().

If not, then I'd really like an easy way to hook into this code and modify this behaviour.. I'd have to think about what that might actually look like though..

Because right now to modify it, I'd have to extend CouchbaseTemplate and SimpleCouchbaseRepository and override the current findById() method or add a new one, which means copying and pasting most of ReactiveFindByIdOperationSupport, ExecutableExistsByIdOperationSupport and related classes/interfaces just to change add a few lines deep down in those classes.

mp911de commented 3 years ago

Having multiple entity types in the same scope/collection is fine from a repository perspective when remaining within the same type hierarchy. Taking the example where Manager extends Employee and querying the EmployeeRepository, the typical expectation is to return both, Manager and Employee types (base type including its subtypes).

Things become more complicated when unrelated types are stored within the same scope/collection. The last consequence is a ClassCastException if the query returns a User for GroupRepository.findById(…).

We have a similar arrangement in MongoDB. We do not restrict queries to types by default to avoid the need of indexing _class. Filtering elements in memory is probably one of the worst approaches here as the framework would be required to do the same for paginated queries. You cannot (and probably do now want to) load the entire collection in order to filter out the non-matching ones for the sake of counting documents.

Queries on the Template level should still be capable of filtering documents by _class by issuing an appropriate query. Flip side is that _class needs to be indexed and that causes other effects. However, in MongoDB one can write:

template.findAll(Query.query(where("foo").is("bar")).restrict(Group.class), Group.class);

template.findAll(Query.query(where("foo").is("bar")).restrict(Group.class), Group.class, "users_and_groups_collection");

Including _class in the actual query is safe towards pagination as the count query also restricts upon the desired type.

That being said:

  1. Either store a single hierarchy within a single collection (scope)
  2. Or apply type per collection
  3. Use the Template API directly with a similar Query.restrict(Class) approach
mikereiche commented 3 years ago

@aaronjwhiteside

If not, then I'd really like an easy way to hook into this code and modify this behaviour.

Edit: Instead of what is described here, use either the CustomConverter (described later), or the _class property (described above). The method described here will be significantly slower because it uses the query api instead of the kv api.

You already have this functionality with your key-pattern <service name>::<document type>:, and it doesn't degrade performance. But here is what you are asking for:

The template needs the extension to where() discussed in https://github.com/spring-projects/spring-data-couchbase/issues/1066

couchbaseTemplate.findByQuery(Airport.class).matching(QueryCriteria.where("META().id").eq(id).one();

The extension for the repository is straight-forward.

import com.couchbase.client.java.query.QueryScanConsistency;
import org.springframework.data.couchbase.repository.CouchbaseRepository;
import org.springframework.data.couchbase.repository.Query;
import org.springframework.data.couchbase.repository.ScanConsistency;
import org.springframework.data.repository.NoRepositoryBean;

import java.util.Optional;

@NoRepositoryBean
public interface BaseRepository<T, ID> extends CouchbaseRepository<T, ID> {

  @Query("#{#n1ql.delete} where META().id = $1 and _class = $2 #{#n1ql.returning}")
  @ScanConsistency(query = QueryScanConsistency.REQUEST_PLUS)
  void removeByIdValidated(ID id, String className);

  @Query("#{#n1ql.selectEntity} where META().id = $1 and _class = $2")
  @ScanConsistency(query = QueryScanConsistency.REQUEST_PLUS)
  Optional<T> getByIdValidated(ID id, String className);

  default void deleteByIdValidated(ID id){
    removeByIdValidated(id, getEntityInformation().getJavaType().getTypeName());
  }

  default Optional<T> findByIdValidated(ID id){
    return getByIdValidated(id, getEntityInformation().getJavaType().getTypeName());
  }

}
mikereiche commented 3 years ago

The hook into findById() while it still has the _class is via a custom converter:

Add this to your Config class

@Override
@Bean(name = "couchbaseMappingConverter")
public MappingCouchbaseConverter mappingCouchbaseConverter(CouchbaseMappingContext couchbaseMappingContext, CouchbaseCustomConversions couchbaseCustomConversions) {
    MappingCouchbaseConverter converter = new CustomMappingCouchbaseConverter(couchbaseMappingContext, typeKey());
    converter.setCustomConversions(couchbaseCustomConversions);
    return converter;
}

import org.springframework.data.couchbase.core.convert.MappingCouchbaseConverter;
import org.springframework.data.couchbase.core.mapping.CouchbaseDocument;
import org.springframework.data.couchbase.core.mapping.CouchbasePersistentEntity;
import org.springframework.data.couchbase.core.mapping.CouchbasePersistentProperty;
import org.springframework.data.mapping.context.MappingContext;
import org.springframework.data.util.ClassTypeInformation;

public class CustomMappingCouchbaseConverter extends MappingCouchbaseConverter {
    public CustomMappingCouchbaseConverter(final MappingContext<? extends CouchbasePersistentEntity<?>, CouchbasePersistentProperty> mappingContext, final String typeKey) {
        super(mappingContext, typeKey);
        this.typeMapper = new TypeBasedCouchbaseTypeMapper(typeKey);
    }
    @Override
    public <R> R read(final Class<R> clazz, final CouchbaseDocument source) {
             // this will simply read an empty CouchbaseDocument if _class does not match
             // the client will find an entity object with only the version property set
        if( source.get(getTypeKey()) == null || !source.get(getTypeKey()).equals(clazz.getTypeName())){
            return read(ClassTypeInformation.from(clazz), new CouchbaseDocument(), null);
        }
        return read(ClassTypeInformation.from(clazz), source, null);
    }
aaronjwhiteside commented 3 years ago

@mikereiche

The extension for the repository is straight-forward.

This assumes everyone in all our teams knows about this BaseRepository and knows to call the new methods, instead of the standard ones.

I believe it's better that the standard methods behave "correctly"..

There are also use cases where we use the CouchbaseTemplate instead of a Repository for various reasons, so having the "correct" behaviour out of the box here too would be useful. Currently we've been overriding the needed methods.

Also there is no getEntityInformation() method in CouchbaseRepository, that's instead a protected method in SimpleCouchbaseRepository.

EDIT: I think we can use #n1ql.filter instead of requiring access to the EntityInformation class.

And fragments don't support EntityInformation, and it seems the Spring Data Commons maintainers have no wish to support it? @mp911de https://github.com/spring-projects/spring-data-commons/issues/2278

aaronjwhiteside commented 3 years ago

@mikereiche

The hook into findById() while it still has the _class is via a custom converter: Add this to your Config class

I like this, and never considered the converter a place we could hook into before now.

Is it possible to make the returned document null, to act as if no document was found at all?

EDIT: Answered my own question, seems we can throw a DocumentNotFoundException here and it will get mapped later on to an empty Mono.

That's what our current implementation does.

mikereiche commented 3 years ago

This assumes everyone in all our teams knows about this BaseRepository

It doesn't. It will work just fine without it. But If they want the non-standard behavior, they'll need to use the non-standard base-class.

and knows to call the new methods, instead of the standard ones.

So override the the standard methods.

aaronjwhiteside commented 3 years ago

So override the the standard methods.

I guess that works, but then CouchbaseTemplate don't follow the same behaviour and we still need to extend that.

It feels better to fix it in CouchbaseTemplate and have the behaviour flow upwards from there.

aaronjwhiteside commented 3 years ago

@Query("#{#n1ql.selectEntity} where META().id = $1 and _class = $2") @ScanConsistency(query = QueryScanConsistency.REQUEST_PLUS) Optional getByIdValidated(ID id, String className);

also regarding this, we had originally tried this but found it caused performance issues, as you said earlier queries are slower than gets from the data service.

Question, will @ScanConsistency work here if we override a standard method (findById() or findAll())?

mikereiche commented 3 years ago

also regarding this, we had originally tried this but found it caused performance issues

Yes the query api is about 20x slower than the KV api. That's why nobody does it and just encodes the document-type in the key. But you've been pretty adamant that you want to do it, even though you don't need to, because you already have the document type in the key (Edit: you can also have the document type returned in the entity by adding a _class property). The other option for delete is to Get the document (with the checking of _class that the CustomConverter does), and then deleting if it had the correct _class and was returned as non-null. This will only be maybe 2-3X slower than just deleting (not only is it two operations instead of one, the Get operation is synchronous, while the deleteById() is asynchronous.

will @ScanConsistency work here if we override a standard method (findById() or findAll())? See the code I provided. The name of the method - findById() or findByIdValidated() doesn't matter. It's the @Query method that needs the @ScanConsistency

aaronjwhiteside commented 3 years ago

@mikereiche

I want Spring Data Couchbase to enforce the type safety for me.. that's why I choose to use it over the Java SDK..

If I have to manually inspect the prefix of ids for my service, I feel like Spring Data Couchbase isn't doing it's job. And I now have to go and create another abstraction that all our microservices use to validate the ids they are receiving, and have all the projects then plumb this in.. when Spring Data Couchbase already knows about the types and is in a perfect place to act on them.

I ,as you said, adamantly believe that this is a ORM problem and not a business logic problem.

Why should the business layer care what the format of the database ids are? To that layer is should be completely opaque.

That's why nobody does it and just encodes the document-type in the key.

I'm going to tempt the dragon here, and ask so why does Spring Data Couchbase even bother to encode the type in a _class field? Why not just prefix it to IDs transparently and have n1ql queries WHERE meta().id LIKE 'MYPREFIX%'?

aaronjwhiteside commented 3 years ago

See the code I provided. The name of the method - findById() or findByIdValidated() doesn't matter. It's the @query method that needs the @ScanConsistency

I meant to say, will overriding the normal findById() method and adding @Query to it along with @ScanConsistency work as expected? I would have assumed that it would always use the base repositories implementation of those methods.

aaronjwhiteside commented 3 years ago

@mikereiche do you want to jump on some other more real-time communication medium? A voice chat perhaps?

We do have a support contract with Couchbase, I don't know what policies you have in place for such interactions, but let me know. We might be able to clear this up sooner and more efficiently.

mikereiche commented 3 years ago

will overriding the normal findById() method and adding @Query to it along with @ScanConsistency work as expected? I would have assumed that it would always use the base repositories implementation of those methods.

When a subclass overrides a method, the method on the subclass is called. But I wouldn't do that because you'll need to hard-code the entity class name in the @Query which will require a definition for every repository. And given that the CustomConverter can do the same job much quicker, I wouldn't bother with it at all.

mikereiche commented 3 years ago

Why should the business layer care what the format of the database ids are? To that layer is should be completely opaque.

It shouldn't. But if the business layer wants to know the document type just from the key, it's the only way.

I'm going to tempt the dragon here, and ask so why does Spring Data Couchbase even bother to encode the type in a _class field? Why not just prefix it to IDs transparently and have n1ql queries WHERE meta().id LIKE 'MYPREFIX%'?

No idea. It seems like a good idea - you did it. It solves your specific problem. The @IdPrefix annotation supports it. Why not? Many of the mechanisms in spring-data-couchbase are borrowed from other spring-data projects.

mikereiche commented 3 years ago

The read method in the custom converter needs to allow documents without any _class in them - as would be returned by by the query api. (StringBasedN1qlQueryParser includes only fields from the entity in the projection and will not have _class, but the query api will already have a _class =<class> predicate, although it will be possible that an @Query does not have such a predicate (but should))


    public <R> R read(final Class<R> clazz, final CouchbaseDocument source) {
        if( source.get(getTypeKey()) != null && !source.get(getTypeKey()).equals(clazz.getTypeName())){
            throw new DocumentNotFoundException(null);
        }
        return read(ClassTypeInformation.from(clazz), source, null);
    }```
aaronjwhiteside commented 3 years ago

The @IdPrefix annotation supports it

The @IdPrefix and @IdAttribute features are kind of odd from my point of view.

If you have aspects of the id that are part of the entity itself, then these annotations save you from duplicate a little bit of code.

In our case, part of the prefix is the service name and is global to the service. So why would I bother setting it each time in each of our entities before calling save()? In Spring Data Couchbase 3.x you used to be able to provide a default driven by configuration. This appears to have disappeared in 4.x.

The problem with a single configuration driven prefix, was that we wanted our entities to also be prefixed with the entity type. So some portion of it is global and another portion is dynamic.

Either way the @IdPrefix or @IdAttribute's don't help us much here.

And if the recommendation is that applications using Couchbase prefix their application name in the key, it works but is also kind of a kludge.

Maybe I'll open another ticket to bring back the configuration driven default prefix.

It shouldn't. But if the business layer wants to know the document type just from the key, it's the only way.

The way I view it, we don't have such a requirement. And I'm trying to avoid introducing it.

Your argument as I see it is:

  1. It's a Couchbase recommended best practice to prefix identifiers with the application/entity name. (we can agree on that)
  2. Couchbase has limitations with multi tenancy, you're going to have to do extra work in this regard.
  3. By the way our Spring Data Couchbase project, doesn't really help here..
  4. So you're going to have to introduce this logic into your service to deal with multi tenancy yourself.
  5. Oh yeah we know that Spring Data Couchbase could in theory validate types for you, since it persists type information already.. but you know.. it's not really our problem.

The business layer should pass down the opaque identifier, having received it from the rest layer and the data layer should return the entity if it exists.

The crux of the problem I think is the difference between the KV and Query service interactions. My hope would be that Spring Data Couchbase sufficiently hide those differences when using a Repository class.

aaronjwhiteside commented 3 years ago

although it will be possible that an @query does not have such a predicate (but should))

I assume that _class will be present for documents fetched directly from the KV service?

As I'm not too worried about the Query case, the CouchbaseSimpleRepository already ensure all queries it issues have the correct condition on the _class field.

existsById could be overridden to fetch a sub document that only contains _class. Or we could also use a n1ql query for this. I assume that subdocument fetching is still orders of magnitude faster than the query service.

deleteById, for our use case I'm ok with using n1ql here, as it's not as common as getById (we hardly if at all use existsById).

mikereiche commented 3 years ago

Your argument as I see it is:

  1. It's a Couchbase recommended best practice to prefix identifiers with the application/entity name. (we can agree on that)

For the requirement to determine the type of an object solely from its key, that information needs to be in the key. That's not a best-practice, that's a necessity if that requirement is to be satisfied. To be able to determine the type of object from a key without that information being in the key, the document type will need to be looked up. That lookup will incur a cost.

  1. Couchbase has limitations with multi tenancy, you're going to have to do extra work in this regard.

I don't think we discussed multi-tenancy. The _class property should not be used for multi-tenancy. Scopes should be used for multi-tenancy. And collections should be used for document-types. Scopes and collections are part of Couchbase Server 7 and can be previewed in Couchbase 6.5+. I am working on scopes and collections support for spring-data-couchbase.

  1. By the way our Spring Data Couchbase project, doesn't really help here..

Use the CustomConverter that I provided to validate types on a findById(). Deletes will need to verify the type by findById() before deleting (refer to the deleteByIdVerified() provided below).

  1. So you're going to have to introduce this logic into your service to deal with multi tenancy yourself.

Use scopes for multi-tenancy. (and collections for document types). If you need to implement multi-tenancy without using scopes, use some mechanism other than _class.

  1. Oh yeah we know that Spring Data Couchbase could in theory validate types for you, since it persists type information already.. but you know.. it's not really our problem.

Type validation is done on query operations. For kv operations, type validation can be done at a cost. Use the CustomConverter that I provided to validate types on a findById(), or add _class to the entity and check that. Deletes will need to verify the type by a findById() before deleting.

The crux of the problem I think is the difference between the KV and Query service interactions. My hope would be that Spring Data Couchbase sufficiently hide those differences when using a Repository class.

This is indeed the crux of the problem. The couchbase KV does not have the concept of "document type". spring-data-couchbase has implemented it by storing a _class property along with the data. Verifying that a particular key corresponds to a document of a certain type requires accessing that document data (or gsi index, if it is in an index as it should be). The query api already has a n1ql predicate on _class, but for the ById apis which do not, this access is extra and incurs a cost. For the findById() api, it would take a simple check, as the whole document is (usually) retrieved anyway. This can be accomplished with the CustomConverter I provided, or by adding a _class property to the entity and checking that. Verifying the document type for the deleteById() api - which does not access the document data or gsi index - will require an extra access. This can be accomplished by adding a deleteByIdVerified() method to the repository that verifies the document type with a findById() (that leverages the verification done by the CustomConverter), or by checking the value of the _class property returned by findById() (the entity would need to contain a _class property).


import org.springframework.data.couchbase.repository.CouchbaseRepository;
import org.springframework.data.repository.NoRepositoryBean;
@NoRepositoryBean
public interface BaseRepository<T, ID> extends CouchbaseRepository<T, ID> {
    default void deleteByIdVerified(ID id) {
        if (findById(id).isPresent())
            deleteById(id);
    }
}
aaronjwhiteside commented 3 years ago

To be able to determine the type of an object solely from its key, that information needs to be in the key. That's not a best-practice, that's a necessity. To be able to determine the type of object from a key without that information being in the key, the document type will need to be looked up. That lookup will incur a cost.

Yes.

Not documenting this behavior as expected, or offering a feature/flag to change this behaviour to what I suspect many people would assume is sane, is asking for trouble.

I'm grateful for your guidance on the practical workarounds, but I'm still asking that these be made part of Spring Data Couchbase.

I don't think we discussed multi-tenancy. The _class property should not be used for multi-tenancy. Scopes should be used for multi-tenancy. And collections should be used for document-types. Scopes and collections are part of Couchbase Server 7 and can be previewed in Couchbase 6.5+. I am working on scopes and collections support for spring-data-couchbase.

Agree, but we don't have CB 7 yet and I still have 60+ microservices to support.

If you're saying you don't want to put type safety logic in Spring Data Couchbase because right around the relative corner is CB 7 with features that make those changes redundant, I can understand and accept that.

Use the CustomConverter that I provided to validate types on a findById(). Deletes will need to verify the type by findById() before deleting using the deleteByIdVerified() provided below.

Yeah, would still be nicer if this was supported out of the box. Unless see above.

No. Use scopes for multi-tenancy. (and collections for document types).

CB 7 isn't out yet, and you did say this was the recommended approach that services need to verify the id prefix themselves..

This can be accomplished by adding a deleteByIdVerified() method to the repository that verifies the document type with a findById() that leverages the verification done by the CustomConverter.

Again I don't want to introduce new methods for people to use, I want to ensure that the standard methods work as expected.

And I would prefer that I didn't have to maintain the workaround ourselves.

If everything we want is handled by scopes and collections, then I can deal with these workaround for the moment. Until such time that CB 7 hits and we get around to migrating applications and data over to it. But I suspect that's realistically 1-2 years away for us.

mikereiche commented 3 years ago

Agree, but we don't have CB 7 yet and I still have 60+ microservices to support.

And your intention was to have _class that was verified against the repository and I gave you that.

CB 7 isn't out yet, and you did say this was the recommended approach that services need to verify the id prefix themselves..

I said if you want to identify the type from the key only, then you need to put that identification in the key. It's not really a recommendation - it's a fact.

Yeah, would still be nicer if this was supported out of the box. Unless see above.

The CustomConverter is better than having an annotation or something else baked in the product. Imagine six months from now that you want it to do something different. If it was in the product, you'd have open an issue, get it accepted, then wait for a change and then a release. In the converter, you can change it on a moment's notice, throw a different exception, you can have auditing, allow exceptions for specific classes or services - whatever.

aaronjwhiteside commented 3 years ago

And your intention was to have _class that was verified against the repository and I gave you that.

True enough.

I said if you want to identify the type from the key only, then you need to put that identification in the key. It's not really a recommendation - it's a fact.

I never wanted for applications to have to identify the type from the key :)

We only originally did it for aiding human troubleshooting, when looking at logs and other output. Our previous system based on MsSql used the same approach for identifiers. And I think when Couchbase Processional services recommended it (some 4 years ago now), we didn't think too hard about it. Certainly we never intended our applications to verify the prefix in any way.

What I really WANT is for Spring Data Couchbase to not violate the generic interface contract as defined in the repository interfaces, I want that type declaration to be honored for all methods in the interface, not just methods that are backed by N1QL.

I understand the KV store doesn't make this easy to do. There is a mismatch between Couchbase's capabilities and what Java tells you the type will be.

You've given me a workaround, which is wonderful.

But I think you really need to make others aware of this quirk by documenting it, and perhaps for those who want the same sanity document the same workaround you provided to me.

And if I think about it it seems the scopes and collections feature in the upcoming CB7 is an attempt to right this wrong. I.e. Using key prefixes isn't ideal, and there's probably a better way to do it... so try this approach.

The CustomConverter is better than having an annotation or something else baked in the product. Imagine six months from now that you want it to do something different. If it was in the product, you'd have open an issue, get it accepted, then wait for a change and then a release. In the converter, you can change it on a moment's notice, you can have auditing, allow exceptions for specific classes or services - whatever.

Yes, we value composition over inheritance. Which is what I ultimately think your trying to say here. Not that you're taking about inheritance per se.

But violating the type contract of a generic interface and us having to implement a Custom Converter to fix it? That's not really a technology requirement specific to us.. it's applicable to all Java applications.

I remember when Java 5 came out and suddenly, you could apply a generic type to a Collection. And I'm sure we had code to fix that was broken by the idea of being able to assign a single type to a Collection. When we were exploiting it to store many different types. These days you can declare a List and store anything in it, but people rarely do, and it's even frowned upon in polite society ;)

So having said all that, let me take what you have provided and let me see how I can integrate it and if it is good enough for our purposes.

I would also like if you took this feedback and ensured when it comes time to integrate scopes and collections into the repositories that you can make them honor the declared type. Default a repositories collection to the simple name of the entity?