uliSchuster / real-world-backend

A study project on how to develop a production-grade Haskell backend application.
Apache License 2.0
1 stars 1 forks source link

Design domain data model #5

Closed uliSchuster closed 4 years ago

uliSchuster commented 4 years ago

From the API specification, derive an initial draft of the domain data model. Sketch it. List important design decisions to take.

uliSchuster commented 4 years ago

see design document on branch RWHSG-0005

uliSchuster commented 4 years ago

I added drafts for the User, Article and Comment domain types. Please have a look and comment. Key points: They use new type wrappers for individual fields, and Article and Comment types include the User type instead of referencing it.

joanllenas commented 4 years ago

Looks really clean!

There a few things I don't understand though:

So, this looks like a record, right? I didn't know Haskell had records.

data User
  = User
      { userEmail :: Email.EmailAddress,
        userName :: UserName,
        userImageUrl :: URI.URI,
        userBio :: UserBio
      }

What does it do? again, looks like a record, but getUserName looks like an OOP method, also it has de newtype which I'm not sure what it does.

newtype UserName = UserName {getUserName :: Text}

Finally, I've read that RIO does a lot of magic under the hood. Do we really want it? I'm just asking because I have no idea. Is it best practice using it? is it used everywhere?

To be clear, I'm going to research all that stuff, I just wanted to give some feedback sooner than later :)

Git Tip:

uliSchuster commented 4 years ago

Thanks for the git tip!

uliSchuster commented 4 years ago

Haskell has records (see, e.g., here: https://mmhaskell.com/blog/2017/12/24/haskell-data-types-in-5-steps). However, the actual record syntax is rather inelegant and clumsy (see https://stackoverflow.com/questions/5367167/haskell-record-syntax). In particular, the name of record fields "leaks"- you cannot have two records with fields of the same name in one module.

For each field, Haskell creates an accessor function. Say we create a record myUser of type User. Then, Haskell provides an accessor function:

email = userEmail myUser

The newtype is also record syntax - for a record with a single element. Naming this element getUserName is merely for convenience, because Haskell will create the record accessor function getUserName. A newtype simply wraps an existing type, and the accessor function can be used to unwrap it. In the code here, I used newtypes to create separate types for separate things, although they are all Text strings under the hood.

joanllenas commented 4 years ago

Comments on the design document:

Roles

I'm in with both proposals:

So, everyone is an Anonymous user (read-only) by default.

Profile

User References

I think that, when users are referenced anywhere, it's better to use foreign keys, so we keep the database normalized.

Article

Regarding timestamps I would suggest two fields:

Comments

If comments are owned by registered users, I would skip comments for now, or allow Anonymous users to comment.

Tags

Tags are very personal, I would let every user have its own list of tags.

Favorits

Because of the initial Anonymous users limitation this wouldn't be possible.

Following, Feeds

Because of the initial Anonymous users limitation this wouldn't be possible.

Design Questions

  1. I would say that direct mapping fits nicely with our domain, and also will make things less complicated.
  2. It depends if we are on a SPA scenario or not, but let's say we are going to focus on SPAs. Then I would create a REST endpoint for each entity. For instance, if we are on the articles list page we would load: /articles, if we click on an article we would load /articles/{slug}, and so on.
  3. I'm in favor of database normalization, so Foreign keys for every relation, fecth by objects ID and compose the objects in each API endpoint if needed. I think that doing this manually will be better at first. If we start using ORMs or stuff that does too much for us, we won't understand the basics (IMO).
  4. I don't have opinions on this, I have never worked on a multi threaded application, only PHP and NodeJS, where threads are not a threat :)
  5. Are transaction managed internally by the database engine or should we manage them in our application logic?
  6. I'm not us re about this, but in general I would prefer not using more language extensions than strictly necessary, so I guess that "no encapsulation at all", or maybe some, but done manually via native Haskell visibility mechanisms?
uliSchuster commented 4 years ago

Re: Authentication I di think we need users and roles from the start, for otherwise attributing input to certain users does not make sense. With "skipping authentication", I mean that we do not need to implement login and cryptographic sessions. Nevertheless, we need some way to tell the system which user issuing a request. The simplest form would be to transmit a user name in plain-text as part of the HTTP authentication header.

uliSchuster commented 4 years ago

Re: Endpoints The ReST API is completely specified here: https://github.com/gothinkster/realworld/tree/master/api If we want to be compatible with any of the available frontends (https://github.com/gothinkster/realworld), we need to implement this interface. We can add resources, of course. But I'd say we would try to get the existing spec implemented first. The spec is not 100% complete and consistent, though. I propose we start with a subset (users and articles), and then build it from there.

uliSchuster commented 4 years ago

Re: Data normalisation I am all for a properly normalised database design. But the database schema does not need to correspond to our domain model. In my view, the two are completely separate things. The domain model should be comfortable to work with, and make use of all the neat Haskell type safety. Database types are not as flexible, so we need a mapper in between. Main example: I opt for separate DB relations for users and articles. But the Article domain type can contain a User subtype, because the link is 1:1 and the data is always used jointly. Thus, it would be the job of the DB access module to do a JOIN. The domain model should not be concerned with the logical data model in the DB.

uliSchuster commented 4 years ago

Re: Multithreading. I think any sort of web application will automatically be multithreaded - each incoming HTTP request is served in a separate thread. The simplest the of application has services, where each service simply loads some data, processes it, and maybe updates the DB or returns data to the frontend. Because multiple service calls might come in at the same time that concern the same piece of data in the DB, we need to be clear on how to handle this type of concurrency: Do we lock the entire table (not a very performant option)? Do we let the DB do consistency checks upon write, with the risk to inform a user that the data he or she submitted conflicts with an update of another user at the same time? both are DB-based consistency mechanisms. Alternatively, we could keep data in main memory and handle concurrency there; e.g., via Haskell's Software Transactional Memory (STM) feature. I strongly opt for the simplest way possible: Do optimistic locking on the DB and let the DB fail a transaction in the unlikely scenario that two users in two threads want to manipulate the same table row. This is - I suppose - how most web applications handle it.

uliSchuster commented 4 years ago

Re: Transactions I don't know anything yet about Haskell DB frameworks. Let's just wait until we get there. As long as we keep domain model and DB model separate, we do not need to care about the DB interface right now, except for references/foreign keys.

uliSchuster commented 4 years ago

Encapsulating domain types to me seems to be the most important aspect of domain modelling. I highly recommend these two articles:

To be more concrete, look at the following issue: An article has a title and a slug. The API specification states that the slug should derive from the title, and should change whenever the title changes. A simple slug would take the title, make it all lowercase, and replace whitespace by dashes, like so: "A Standard Article Title" --> a-standard-article-title. There are lots of things that can go wrong if we use type Text for the title: It can be arbitrarily long, it can contain arbitrary unicode characters, it can have leading and trailing whitespace, etc. We can validate that this is not the case when we obtain a title from the API, but we cannot guarantee it later on. So, does the slug-generation function simply assume that all prerequisite hold? Or does it check a second time? If it does check by itself, what happens if a constrained is violated? Throw an exception or return a Maybe value? How should upstream code handle the Maybe value? I think it is much better to encode the invariants (maximum length, not leading whitespace, etc.) into the Title type. Then, all functions that rely on this type do not need to make assumptions and do ad-hoc validation. Instead, we push validation to the input boundary once and for all - we "parse" instead of validating. Yet, to do so, we must define our own Title type and ensure that it cannot be constructed with illegal content. The same holds true for other basic types. For me, this is the key difference in domain modelling compared with other programming languages. In Haskell, we can "offload" many such invariants to the type system, which we would otherwise need to check and handle over and over again, at the cost of writing those types initially.

joanllenas commented 4 years ago

Ok, so to recap:

Regarding domain modelling I'm with you, I like the "making impossible states impossible" philosophy and all. We should aim for that!

I think that the compromise here is:

uliSchuster commented 4 years ago

So far, we take an Article and a Comment to be two separate things. Ar they? Or do they have enough things in common that we can factor out? Both are some text written by some author. Is this enough to create a BlogContent type that handels the text and author link, and then wrap it up in a surrounding Article and Comment types. What would we gain from this overhead?

uliSchuster commented 4 years ago

Re: Tags

  1. Should users be able to create new tags, or should we provide a fixed set of tags? Proposal: Start with predefined tags. This simplifies the problem, because we no not need CRUD-logic for tags.
  2. Should the Author add tags to his or her articles, or should every user be able to add tags to arbitrary articles? Proposal: Only authors can add tags to their articles.
  3. If arbitrary users can add tags to arbitrary articles, are these tags private – that is, every user only sees his or her own tags, so that Article A has tags [U1.A, U1.B, U1.C] for user U1, but, say, tags [U2.A, U2.B] for user U2? Proposal: Don't make it that complex, follow the preceding option and only let authors tag their own articles.