zussel / matador

Take your appclication by the horns
http://zussel.github.io/matador
GNU General Public License v3.0
63 stars 22 forks source link

Create an example project #71

Open zussel opened 7 years ago

zussel commented 7 years ago

Create an example project which covers almost all facets of OOS.

Suggestion: Blog

Entities

Post

Attribute Type
id identifier\
title varchar\<255>
author belongs_to\
created_at time
categories has_many\
comments has_many\
tags has_many\<varchar\<255>>
attachment blob

Author

Attribute Type
id identifier\
name varchar\<255>
birthdate date
posts has_many\

Comment

Attribute Type
id identifier\
post belongs_to\
email varchar\<255>
comment std::string
created_at time

Category

Attribute Type
id identifier\
name varchar\<255>
description std::string
posts has_many\
saki7 commented 7 years ago
zussel commented 7 years ago

Attachment is a good idea!

For relation post <-> categories I've to implement missing many to many relationships. And I came to the conclusion that I need something to declare belongs_to. For both I've some ideas I've to think over.

Suggestion:

Change the syntax in the serialize() methods of each entity in the form of

struct post
{
  oos::identifier<unsigned long> id;
  oos::varchar<255> title;
  oos::time created_at;
  oos::has_many<comment> comments;
  oos::has_many<category> categories;
  oos::has_one<author> creator;

  template < class SERIALIZER >
  void serialize(SERIALIZER &s)
  {
     s.primary_key("id" id);
     s.attribute("title", title);
     s.attribute("created_at", created_at);
     s.has_many("comment", comments);
     s.has_many("category_post", categories);
     s.belongs_to("author", creator);
  }
};

struct author
{
  oos::identifier<unsigned long> id;
  oos::varchar<255> name;
  oos::has_many<post> posts;

  template < class SERIALIZER >
  void serialize(SERIALIZER &s)
  {
     s.primary_key("id" id);
     s.attribute("name", name);
     s.has_many("author", posts);  // counterpart of creator in post
  }
};

struct category
{
  oos::identifier<unsigned long> id;
  oos::varchar<255> name;
  oos::has_many<post> posts;

  template < class SERIALIZER >
  void serialize(SERIALIZER &s)
  {
     s.primary_key("id" id);
     s.attribute("name", name);
     s.has_many("category_post", posts);
  }
};

At first sight it is more readable and understandable for the user. What do you think?

saki7 commented 7 years ago

As for many-to-many relationships Rails offers two methods has_many :through and :has_and_belongs_to_many where the latter is the shorthand form which could be written in single line. The former is a customizable way which enables the user to specify callbacks and custom attributes for the joining table.

In either way user should create a joining table. A simple example is as follows:

post_id INTEGER
category_id INTEGER
is_primary_category BOOLEAN

And for has_one relationship indeed you should have an assignee_id column. That would be done in belongs_to. I'm wondering how it have worked until now without belongs_to feature.


The new syntax looks good. Btw is it possible to deal with user-defined types? Was there a syntax for both serialize/deserialize?

saki7 commented 7 years ago

This page might give you a hint: http://guides.rubyonrails.org/association_basics.html#choosing-between-has-many-through-and-has-and-belongs-to-many

saki7 commented 7 years ago

And the column is_primary_category BOOLEAN is the custom attribute for the joining table I mentioned. We really need it in some locations.

As for many-to-many relationships there will be two statements declaring the relation table thus it might lead to duplicate of CREATE TABLE publishment. Then we have to reconsider the implementation details.

saki7 commented 7 years ago

Btw Choosing between VARCHAR(255) and VARCHAR(256)

VARCHAR allocates extra bytes to determine the data length.

As you can see the former is slightly better. So the people have been historically using the former more often.

zussel commented 7 years ago

Thanks for your comments! has_many<T> works by creating always a relation table (in a former version OOS was able to let the user decide whether a relation table is created or the foreign table consists of the counterpart)

This would be possible with the new belongs_to() syntax. As for many_to_many relationships OOS is able to determine the counterpart and won't execute double create statement for the relation table.

Thus the new syntax is API changing I would put it into 0.5.0.

As for many_to_many and belongs_to as well as custom attributes it should be done later.

one_to_one isn't explicitly supported, you can achieve it with having a has_one<T> attribute within each entity. But its not exactly what one_to_one stands for.

struct person
{
  identifier<unsigned long> id;
  has_one<person_details> details;
};

struct person_details
{
  identifier<unsigned long> id;
  has_one<person> person;
};

What do you mean with custom-defined types?

In a former version there have been both serialize() and deserialize() but it was kind of redundant, so I dropped the latter. We can consider to rename serialize() into describe().

I wasn't aware of that VARCHAR behavior. Thanks for the information. I'll change my sample code above.

saki7 commented 7 years ago

one_to_one isn't explicitly supported, you can achieve it with having a has_one attribute within each entity. But its not exactly what one_to_one stands for.

I'm afraid that has_one + has_one is logically incorrect. The possible scenarios are only limited to following:


What do you mean with custom-defined types?

I mean something like

class URL
{
public:
    URL(std::string const& scheme, std::string const& host, unsigned port, std::string const& path, std::string const& query, std::string const& fragment);
    static URL parse(std::string const& url_str);
    std::string to_string();

private:
    std::string scheme_, host_, path_, query_, fragment_;
    unsigned port_;
};

class Site
{
public:
    template<class S>
    void serialize(S& s)
    {
        s.serialize("url", url_.to_string()); // It's going to be VARCHAR(255)
    }

    template<class S>
    void deserialize(S const& s)
    {
        std::string url_str;
        s.deserialize("url", url_str);
        url_ = URL::parse(url_str); // <------ here
    }

private:
    URL url_;
};

So you will need both serialize/deserialize API. What do you think of this case?

saki7 commented 7 years ago

Refs issue #72

zussel commented 7 years ago

I'm afraid that has_one + has_one is logically incorrect. The possible scenarios are only limited to following:

That's what I meant with 'not exactly the same'. We need a belongs_to! Right after 0.5.0 as a next step.

Ok, I understand user-defined types. But what about having the possibility to pass to_string/parse function provided for example by the user-defined type class.


struct url
{
  static std::string to_string();
  static url void parse(const std::string &value);
};

template<class S>
void serialize(S& s)
{
  s.attribute("url", url_, &url::to_string, &url::parse);
}

Or OOS will detect these methods via type traits And the names to_string() and parse() could be a convention. Then one could simple write

template<class S>
void serialize(S& s)
{
  s.attribute("url", url_);
}

And OOS will do the rest.

saki7 commented 7 years ago

The problem for that method is that the user-defined types might be designed to use a (multiple) free function or something.

And the most important issue is that you are actually going to place restrictions for the function prototype.

What if a class in a library already has some kind of parse/to_string functions available and they have a different API than the one OOS is expecting? You will need to write an another wrapper then.

static URL parse(std::string const& str, bool verbose /* <-- */);

You could still do something like

s.attribute("url", url_, &url::to_string, std::bind(&url::parse, std::placeholders::_1, false));

But it won't fit all the cases because it depends on how complex the user-defined type is.

The member function's name could also vary among the implementation like below so the auto-detection will not work

How do you think?

zussel commented 7 years ago

Ok, I get the point. But to be honest I don't like the idea of a serialize()/deserialize() interface. In most cases the code is almost redundant and the user will ask why the must write doubled code. That's why I removed that.

I would like to find another solution for that.

We can mix the auto detection of to_string()/parse() with the ability to let the user pass a fitting function. So if the user stays with the convention everything will work automagically in the other case the user has to write and pass a wrapper function.

I've learned that it's hard to cover 100% of all use cases but 90% is much easier. And the remaining 10% are also able to get their things done with the library (with a bit overhead on their side).

What do you think?

saki7 commented 7 years ago

The design of boost::serialization might give you a hint. A plain, simple non-intrusive serializing code is written in this way:

#include <boost/serialization/serialization.hpp>
#include <boost/serialization/nvp.hpp>

struct Data {
    int number;
    double real;
};

namespace boost { namespace serialization {

template<class Archive>
void serialize(Archive & ar, Data & d, unsigned int /* version */) {
    ar & make_nvp("number", d.number);
    ar & make_nvp("real", d.real);
}

} } // namespace boost::serialization

And for splitting serialization, it's written in this way;

#include <boost/serialization/serialization.hpp>
#include <boost/serialization/split_free.hpp>

struct Data {
    int value;
};

namespace boost { namespace serialization {

template<class Archive>
void serialize(Archive & ar, Data & d, unsigned int version) {
    split_free(ar, d, version);
}

template<class Archive>
void save(Archive & ar, Data const& d, unsigned int /* version */) {
    const int v = d.value ^ 0xFFFFFFFF;
    ar & v;
}

template<class Archive>
void load(Archive & ar, Data & d, unsigned int /* version */) {
    int v;
    ar & v;
    d.value = v ^ 0xFFFFFF;
}

} } // namespace boost::serialization

So the common thing is to call serialize() at first. Then if a library detects split_free() it passes the remaining stuff to save() / load(). I think this is the most efficient and least redundant way to solve these use case.

We can mix the auto detection of to_string()/parse()

No offense but I actually don't like the auto detection. Because the "convention" which OOS is specifying, i.e. the rule that "a function called to_string() will be used for serializing", might not be logically accurate. It (the to_string() function) could actually be a function doing something totally different than serializing, for example, it could be a function which yields a string value for logging purpose.

with the ability to let the user pass a fitting function.

This idea is not that bad but I think it could be superseded by the splitting serialization feature. Scenario:

These five steps can not be done by specifying a single function. That's what I'm concerning. What do you think?

zussel commented 7 years ago

You're right with the auto detection. The more I think about it the clearer gets my opinion that auto detection isn't the right strategy for that.

But I'm afraid that I'm not convinced with the splitting of the serialization method. I knew the boost::serialization. I like the approach but I'm not satisfied with some points of how they implemented the user interface. Splitting is one point.

The example is good - in a theoretical way. Imagine a class with lets say twelve attributes of which two are user defined types. You have to double over 80% of the code if you use splitting.

I would prefer an approach where you pass an adapter object to the user defined type:

s.custom_attribute("url", url, url_adapter);

As you asume correctly the adapter implements an interface to serialize and deserialize the custom type.

In addition there could be a global mapping (like the table name mapping we've implemented), where you register an adapter to custom type globaly. This can be overwritten in any case call custom_attribute() an pass a special adapter for this case.

This approach needs only one serialization method and puts the custom type formatting and parsing in an separate reusable class.

I personally like this approach and would be satisfied with it.

What are your thoughts?

P.S.: Anyway, we will implement this in a later release. What do you think about the renaming. Could you go with it?

saki7 commented 7 years ago

The example is good - in a theoretical way. Imagine a class with lets say twelve attributes of which two are user defined types. You have to double over 80% of the code if you use splitting.

Nice point!

s.custom_attribute("url", url, url_adapter);

This is magic. We should go with this.

P.S.: Anyway, we will implement this in a later release. What do you think about the renaming. Could you go with it?

As for renaming (you mean #72 right?) I think it has a small bad aspect.

By now each member of an entity is serialized by calling s.serialize(, ) independent what kind of member it is. This should be made explicit for a better user experience.

I think the opposite way. It could made implicit for reducing redundancy since you will need to write a doubled code:

oos::has_many<post> posts; // one "has_many" here to specify the type
s.has_many("category_post", posts); // one more "has_many" here. You need to go back to the member declaration and check what type it was.

So I think the current implementation which uses type deduction for serializing provides better user experience and less redundancy.

boost::serialization deduces type within make_nvp() too.

zussel commented 7 years ago

This is magic. We should go with this.

Thanks! But I don't think it's magic :smile: It's just another way of solution (indeed which I like more).

And as for blobs we could use the adapter as well. If we have a member of type image_file we could write a blob_image_adapter and treat the image file like custom type.

I think the opposite way. It could made implicit for reducing redundancy since you will need to write a doubled code:

Yes, you're right. It's the as for identifier<T> and s.primary_key(identifier).So we'll keep the current interface.

Thanks for this constructive discussion! I think we've a plan!

saki7 commented 7 years ago

So we have agreed to reject #72 and unify them into serializer.serialize("column_name", member) right? Is there still any need to split serialize() into some other methods?

zussel commented 7 years ago

Yes, I think we need to implement a kind of belongs_to (as referenced in #74) as counterpart to has_one and has_many to achieve the requested behavior.

An alternative is to integrate it in the serialize(name, has_one, cascade_type) method. But that breaks the pattern.

I would go for a belongs_to<T> class analog to has_one<T>. As the latter is derived from object_holder it will be the same for belongs_to but with a distinguishable attribute. Than the serialize() method will implement the specific behavior.

saki7 commented 7 years ago

An alternative is to integrate it in the serialize(name, has_one, cascade_type) method. But that breaks the pattern.

I think it would be nice to have belongs_to integrated into serialize() with type deduction too. So that post.attachments[num] yields the same instance as attachment.post. Do you understand my meaning?

OMG I'm so drunk now.

zussel commented 7 years ago

Yes I understand absolutely what you mean :+1:

Do you have a party? It's almost 20:45 in Tokyo. Have fun!

saki7 commented 7 years ago

I'm having a party in my friend's apartment 😄 😂 Good to see you understood my idea. Feel free to ask me anything if you have any more concerns. Enjoy the weekend!