SQL Automapping - Githubissues

jodosha commented 8 years ago

Actual

In order to add an attribute to an entity, we need to:

add the database column via a migration
add the attribute to the entity (via .attributes)
map the database column with the corresponding attribute

While this is a powerful approach, it's really frustrating to deal with it.

Proposal

Eliminate the need of mapping database columns to entities attributes.

This is an unnecessary step for brand new applications. With relational databases we can use reflection to return the right data set and have the repositories to know what to do, without an explicit user intervention.

Hanami::Model.configure do
  adapter :sql, ENV['DATABASE_URL']
  # look 'ma, no mapping!
end.load!

book = BookRepository.new.find(1)

Entities

However, mapping has a useful role: specify Database to Ruby (and viceversa) coercion rules. Think of a date, it's returned as a string from the low level adapter and thanks to the mapper we're able to get a Date instance.

We need to solve this problem.

Alternative 1

Build a reflection mechanism to automatically resolve coercion rules, starting from database schema. This is convenient, but using entities will always require a database connection (even for unit tests).

class Book
  include Hanami::Entity
end

Alternative 2

Build a powerful attributes DSL for Entity. Something similar to Virtus or Dry::Data:

class Book
  include Hanami::Entity

  attribute :title, String
  attribute :isbn, String
end

The advantage here is to have entities that are still close to Ruby objects (aka PORO), and they don't need a database connection to be instantiated.

This is an improvement, but not DRY as "Alternative 1". We can solve this, by empowering the model generator from hanami gem.

➜ bundle exec hanami generate model book title:string isbn:string

This command above will generate the entitiy with the typed attributes.

Repositories

The role of the mapper was to associate a database table to an entity and a repository. Without a mapping, we need to figure out how to associate these elements.

My proposal is introduce new conventions:

For a given BookRepository the target entity will be Book (remove Repository suffix)
For a given BookRepository the target table will be books (remove Repository suffix, downcase, pluralize)

By default this is the code that a developer will see:

class BookRepository
  include Hanami::Repository
end

Exceptions to this rule can be expressed with the following code.

Different Entity Name

class BookRepository
  include Hanami::Repository
  entity MyBook
end

Different Repository Name

class AlternativeBookRepository
  include Hanami::Repository
  entity Book
end

Different Database Table

class BookRepository
  include Hanami::Repository
  collection :t_books
end

/cc @hanami/core-team @hanami/contributors

hlegius commented 8 years ago

Good points! I am struggling with lots of domain/infra mapping right now.

By removing database <-> entity mapping could be a great improvement for large projects, but IMO, we need to be careful in how much magic we'll create here. Giving the coerce decision up to developer is essential to build complex entities. By complex, I mean, Float/BigDecimal decisions, UUID type and/or user defined types.

I don't think it would be a good idea remove it entirely, leaving all the complexity to the framework. As you said, it will create a coupling between domain and infrastructure layers. So, Entities: Alternative 2 :+1:

About Repositories: that's great. I'd rather define repository name as plural by default to give that 'plural' idea to anyone who read any repository class name. i.e.: BooksRepository mapping for Book entity. So, Repositories conventions :+1:

jodosha commented 8 years ago

@hlegius

I don't think it would be a good idea remove it entirely

Do you mean to remove the mapper? Nope, that's not my intention. We still need be able to target schema-less databases.

jodosha commented 8 years ago

For the record, I've implemented this feature with a code spike, by using ROM. Please check this Gist. I'm not sure if it still works. :wink:

solnic commented 8 years ago

It makes sense to infer 'entities' from schema definition to avoid info duplication and have an option to define entities explicitly when query results no longer return data representation that matches the canonical schema.

booch commented 8 years ago

I'm strongly in favor of alternative 2. One of my biggest frustrations with ActiveRecord is having to look in 2 separate places for all the details for an entity. Having the associations defined in the model and the other attributes defined in the schema drives me nuts. If you define them all in the entity/model, you could always create schema and schema migrations from that. The repository could check that everything the entity expects exists in the schema, or could take an optional mapping between the schema and entity model.

cllns commented 8 years ago

I prefer 2. It would be like having annotate_models for the entity's attributes. Plus it's actual code, instead of generated comments.

For Repository: BookRepository, BooksRepository, or even just Books are all fine by me (though Books v Book could be confusing.)

Would coercions would be moved from hanami/validations to hanami/model, via this syntax? Including custom coercions? (so attribute :title, String vs attribute: title, type: String)

davydovanton commented 8 years ago

I also prever second alternative :+1:

solnic commented 8 years ago

The only reason to have this kind of API is convenience in the early days of a project. Your entities are not definitions of your tables, they are definitions of your domain objects. For convenience you can infer their definitions from the schema when they match, but it won't last for long in every non-trivial project (although I'm 99% sure even in simple projects schema and domain objects will diverge pretty quickly).

So, treating entities as your table definitions == ActiveRecord. It makes no sense to do it unless you're building an AR. In DataMapper there is an API like you describe and the complexity that comes with mixing entity definitions with schema-related definitions is a disaster. We've been there already!

hlegius commented 8 years ago

Your entities are not definitions of your tables, they are definitions of your domain objects. :+1:

@solnic

So, treating entities as your table definitions == ActiveRecord.

So, you are agreed in alternative 2, but in your exp, attribute :foo not necessarily will be mapped to a row called foo in database' table. So, are your suggestion downs to have Entities detached from any knowledge from persistence layer, mapping could be defined somewhere later by a proper class?

pascalbetz commented 8 years ago

:+1: for alternative 2

given that hanami wants to support a multitude of databases this makes also sense, as not all databases will provide the possibility to reflect on data types.

providing a convenience mechanism would be nice though.

solnic commented 8 years ago

It seems like I don't like either of the options, now that I think about it. I think an explicit schema should be defined (I'm working on this for rom already) because it's the canonical source of information about your db tables and you can infer migrations from that as well as basic canonical models that match db schema for early-stage project convenience. You can also define "associations" in a schema to infer automatic behavior for building aggregate objects, and persisting "object graphs", too. That's how it's gonna work in rom very soon.

As far as custom mapping goes - I don't think mapper configurations are needed at all. You can, and should, leverage query API to get the data in the expected form and then it's simply a matter of instantiating "domain objects" from that data. That's how it already works in rom. For high-level data types I use dry-data (soon to be renamed to dry-types btw) and there's no need for any complex mapper definitions (which I abandoned in rom already).

jodosha commented 8 years ago

@solnic

I think an explicit schema should be defined (I'm working on this for rom already) because it's the canonical source of information about your db tables and you can infer migrations from that as well as

Do you mean have a "master schema/types definition" that can be represented as a database migration and entity coercions?

basic canonical models that match db schema for early-stage project convenience.

Is this approach similar to "Alternative 1"? Is the attribute entities definition implicit in this case?

As far as custom mapping goes - I don't think mapper configurations are needed at all.

I'm confused by this one :smile:

there's no need for any complex mapper definitions (which I abandoned in rom already).

That's the goal here as well. :wink:

jodosha commented 8 years ago

@solnic One last question about ROM :smile: How does it works with schema-less databases?

solnic commented 8 years ago

I think an explicit schema should be defined (I'm working on this for rom already) because it's the canonical source of information about your db tables and you can infer migrations from that as well as

Do you mean have a "master schema/types definition" that can be represented as a database migration and entity coercions?

no, I’m talking about an explicit database schema definition. It does not have any information about entities or some high-level types, these should be defined in entities.

basic canonical models that match db schema for early-stage project convenience.

Is this approach similar to "Alternative 1"? Is the attribute entities definition implicit in this case?

Sort-of. You can infer models from schema but personally I don’t see any reason for defining my own classes. I treat these objects as pure data objects and once there’s a need to provide custom interface around these data it’s typically the right moment to start defining classes explicitly and separating domain layer more cleanly from the persistence, which also means I’d start defining my own attributes, even if it means having similar info in the schema and the models.

As far as custom mapping goes - I don't think mapper configurations are needed at all.

I'm confused by this one [image: :smile:]

When you’re loading data from a db two things happen:

1) potential structural transformation of the “raw” data 2) potential coercions of individual values into high-level concepts

The first part is done by the persistence “framework”, the second one should be handled by whatever library you’re using to instantiate your own models. And that’s my setup, rom is performing structural transformations, and dry types are instantiating objects from the data that rom returns.

In this setup custom mappers are rarely needed, because query APIs are typically powerful enough to do all kinds of data projections, and the db drivers typically handle low-level coercions for you, too. In rare cases you can define custom mappers, but from what I’ve seen it’s maybe 5% of the use-cases.

solnic commented 8 years ago

One last question about ROM :smile: How does it works with schema-less databases?

It just works. You persist whatever you want and make a mess in your database :joy:

Seriously though - that's why I'm introducing explicit schema dsl in relations, to have a place to specify the structures you're gonna work with.

schrockwell commented 8 years ago

This is an unnecessary step for brand new applications.

I disagree. What matters most to me is long-term readability and maintainability, as opposed to saving a few keystrokes creating an MVP.

After reading the other proposals for 0.7.0, it seems like entities are trending towards lightweight, immutable objects. This entire proposal seems to contradict that mentality. If you want to go fully "data in, entity out", then entities should be simple PORO objects without any external dependencies. There needs to be something to handle the transformations from data-to-entity and entity-to-data, and that duty is presently handled by mappers.

For alternative 1, you're tying the entity directly to a DB. If I need to refresh my memory and look up the attributes of that entity, I need to go... where? Without a mapping or an explicit definition, it's hard to figure out what attributes are even members of the entity. It feels too ActiveRecord-y, too magical, too hard to test, and it suffers a huge hit in reusability.

Alternative 2 is interesting, but it feels like a duplication of action params. Are there going to be two separate parameter DSLs, one for entity attributes and one for action params, or could they be combined into a common DSL? Since you're adding coercions directly to the entity, that's an implicit type validation, so now the entity has validation business logic in it. That's a red flag.

One cool thing you could do with approach 2 is make concrete classes to mix in to different entities. E.g. if you had Book and Article entities which you know both have authors and copyright dates, you could create a reusable WrittenWorkAttributes module that defines the attributes author and copyright_date with appropriate coercions, and mix it in to both entities.

For explicitness and reusability, I prefer approach 2 over 1, but given the ultimate choice I think the existing model of data mappers is still a good fit.

accuser commented 8 years ago

I think that the mapping configuration (as it stands right now) adds an unnecessary complexity, and I would prefer to define an entity on the basis of the attributes I expect it to have (as a PORO). The associated repository should be concerned about the persistence of entities, and how they map to an underlying table / document / JSON object of a remote service. I would never expect an entity to map directly to its source schema, and there may be, for example fields in a table for audit purposes that an entity should not include. Whether this mapping should be derived from the entity or an explicit mapping is really a moot point, but from a maintainability point of view (and testability) I would prefer the expectation established in defining an entity's attributes to be definitive.

pascalbetz commented 8 years ago

@schrockwell

Re Reusability: reusing such tiny bits of code (author, copyright) will hurt readability by saving some lines of code, i would not consider this an advantage.

runlevel5 commented 8 years ago

I am not a big fan of alternative 1 approach, it is like AR.

Having the entity to implicit handle coercion/validation turns entity complex than needed. I'd prefer to keep it as is. I am thinking if we need a third alternative, that is to let our generator handle all mapping.

pascalbetz commented 8 years ago

In general: i think the new solution should not be driven by the goal of reducing lines of code, ease of use for beginner developers or simple applications.

The solution should be consistent, extensible, future proof. And if we can simplify the code by some convenience methods/helpers/tools (automapper, generators) then that is fine for me.

@solnic can you give a quick summary of the layers/components involved and what the flow would be? Not sure i'm following you (though i think i am, and i think i like your proposal:-))

jodosha commented 8 years ago

I forgot to mention a few issues that we should solve when implementing this:

Respect database default values (#237)
Booleans can't be dumped in repository (#272)
Empty values can't be coerced (#273)
Raise error when :id isn't defined in a mapping since it's required by the repositories (#278)
An update action should not include the ID (#279)

For me, they all expose the limit of having a mapper.

Titinux commented 8 years ago

I don't want to offend anyone but why not use rom directly ? Let hanami be the frame of the house and use rock solid project inside (rom, dry-validation, trailblazer, ...) Let's put efforts in one common goal and why not one day we may eclipse the good old Rails. :triumph: :sweat_smile:

solnic commented 8 years ago

@solnic can you give a quick summary of the layers/components involved and what the flow would be?

Here's how it works in ROM (not everything is released yet though):

relations define their schemas as the canonical source of information about data structures stored in the database, this includes names of attributes and their database types, ie we have SQL::Types::JSON etc. so when you're persisting data, values are being properly coerced
relations can project data however you want, which is what you want to do to provide simpler data structures to your application, and relations are composable
mappers are used, but you don't even see that, they compose data together, ie when eagerly loading associations etc. and typically you do not need to define mappers manually
repositories are used to compose data from relations together and build "domain types" from that. You can use whatever library you want for your models. In dry-data we have structs and values that work very well with this. Only repositories know about your domain types - relations are used purely as your gateway to data in the database.
Coercing to high-level domain types is not done by the mappers, your "model library" should handle that. Again, like dry-data
When making changes in your database, you use a validation/coercion mechanism to create a data structure that is safe to be persisted, and simple, high-level coercions are handled by that mechanism (ie string=>date when data come from a web form) and low-level db coercions are handled by types defined in the relation schema. For this purpose dry-data and dry-validation is used

One thing to mention is that I do not have the concept of "saving entities". I prefer separate "models" for reading and writing, in fact, for writing a rarely need any custom "model". I just handle the input data, perform validation/coercion and save it in the db. One crucial missing piece is a "changeset" concept, where you update an existing record/row/document/whatever in a database and you want to calculate which attributes are actually going to be updated. This will be added to ROM very soon.

pascalbetz commented 8 years ago

@solnic thanks. @Titinux there still needs to be some "glue" between hanami and rom. So if "backend" is switched to rom instead of something homegrown (as currently discussed) it is still up to the framework how rom is configured and what the exact interface looks like. But I'm with you on "reuse" when there is a good fit.

solnic commented 8 years ago

I don't want to offend anyone but why not use rom directly ?

I believe many people in Hanami community have different expectations when it comes to the database/model part of the framework. ROM has a different philosophy and doesn't have common features like "saving entities" or handling mutable state. ROM's repositories are not typical ORM-like repositories with abstract query interfaces and swappable adapters. I ditched these concepts to simplify the library (and never regretted this decision, fwiw) but it may not meet various expectations, even though it's extremely powerful as you can easily leverage db-specific features which is a huge win...

Let hanami be the frame of the house and use rock solid project inside (rom, dry-validation, trailblazer, ...)

I have high hopes that with schema and command support in rom repository more people will adopt ROM. We're gonna have interfaces that are very similar to what @jodosha is proposing (ie. user_repo.save(attributes) or user_repo.update(id, new_attributes) including support for persisting nested data (already got a PoC working in a branch).

Whether ROM is a good fit for Hanami is to be decided by the maintainers and the community, I can only say that I would support this :100:% if you decided to give it a go.

dry-validation is below 1.0.0 but I'm working hard to make it stable soon. It's already used by a couple of apps that will hit production in the upcoming months. I would definitely like to encourage you to try it out and consider using it in Hanami.

Let's put efforts in one common goal and why not one day we may eclipse the good old Rails.

Yes, please :)

jodosha commented 8 years ago

@Titinux

I don't want to offend anyone but why not use rom directly ?

No offence at all. :smile: As @solnic said, it's just a matter of setting expectations for the Hanami Community. We are a good Rails alternative also because we offer a full stack framework with consistency across the layers.

Removing the M in MVC can make things harder for adoption and put Hanami in a position where we can't decide how the business logic should be organized in a project.

Titinux commented 8 years ago

Removing the M in MVC can make things harder for adoption and put Hanami in a position where we can't decide how the business logic should be organized in a project.

That make sense. Hanami is versatile enough to have a solid, simple and complete base and for more complexes senarii we can replace whatever part we want.

jbodah commented 8 years ago

Would just like to add that we've been using data mapper to map data from multiple databases (Cassandra, Redis, Postgres) into entities of the same type, which is a huge plus of the data mapper pattern in our eyes. Entities wind up being composed from data from multiple repositories (or a composite repository). While this is an entirely separate beast, it's something to consider.

On that note, I'd probably just echo what most people have been saying: one of the most painful parts of AR is working with implicit attributes. I've never been a huge fan of having to declare types for things in Virtus, but I suppose something needs to know my types :smile:

+1 for alternative 2

jxxcarlson commented 8 years ago

Let me also say that having a data mapper available has been crucial for my work.

jxxcarlson commented 8 years ago

With alternative 2, how do we do database mapping? In the project I am working on I have legacy data and new data. Having a mapper has been a godsend. Will this functionality be preserved in some form for those who need it? Sorry that I am late to the discussion and don't fully understand the issues.

solnic commented 8 years ago

I don't think anybody tries to say that data mappers are not useful, I'm just saying that having to manually define mappers all the time is a PITA, and that's one of the biggest problems that we managed to solve in ROM. You can compose data from multiple dbs without the need to define mappers explicitly.

jodosha commented 8 years ago

I don't think anybody tries to say that data mappers are not useful, I'm just saying that having to manually define mappers all the time is a PITA

@jxxcarlson This is the point :point_up:

solnic commented 8 years ago

With alternative 2, how do we do database mapping?

Something I've been trying to explain is that most databases support pretty advanced data mapping features, so first of all - use that. In many, many cases you can retrieve data structures from the databases without the need to do anything special in Ruby, except instantiating objects by passing these data structures to the constructors.

In cases where you need to do mapping in memory - you should have access to custom mappers.

jmchuster commented 8 years ago

Why does the magic have to happen in the Model? Why not put the magic in the mappings? Something like this...

Hanami::Model.configure do
  # ...
  mapping do
    # start with an empty mapping as the default
    # then define everything
    collection :books do
      entity     Book
      repository BookRepository

      attribute :id,    Integer
      attribute :title, String
    end

    # or, just map everything automatically
    auto_collection :books

    # or if you want to customize
    # start with the automatic mapping as the default
    # then define the differences
    auto_collection :books do
      # rename an entity
      entity MyBook

      # rename an attribute
      attribute :id, MyUUID, as: :uuid

      # ignore an attribute
      ignore_attribute :title
    end
  end
end.load!

If you're starting out, you probably want to start with the automatic mapping as the default, and then define exceptions until you hit the tipping point, when you go back to the old explicit mappings.

As for having to add the attributes to the Model, I don't really see an issue there. It makes sense to me that you need to declare the Attributes that you want on the Model in the same way that you need to declare the Exposures that you want on the Action.

The flow then becomes:

In order to add an entity, we need to:

add the database table via a migration
add the attributes to the entity (via attributes)
map the database table using just the name

In order to add an attribute to an entity, we need to:

add the database column via a migration
add the attribute to the entity (via attributes)

jodosha commented 7 years ago

Closing as it was implemented by #334

hanami / model

SQL Automapping #292

Actual

Proposal

Entities

Alternative 1

Alternative 2

Repositories

Different Entity Name

Different Repository Name

Different Database Table