Closed uliSchuster closed 4 years ago
see design document on branch RWHSG-0005
I added drafts for the User, Article and Comment domain types. Please have a look and comment. Key points: They use new type wrappers for individual fields, and Article and Comment types include the User type instead of referencing it.
Looks really clean!
There a few things I don't understand though:
So, this looks like a record, right? I didn't know Haskell had records.
data User
= User
{ userEmail :: Email.EmailAddress,
userName :: UserName,
userImageUrl :: URI.URI,
userBio :: UserBio
}
What does it do? again, looks like a record, but getUserName
looks like an OOP method, also it has de newtype
which I'm not sure what it does.
newtype UserName = UserName {getUserName :: Text}
Finally, I've read that RIO does a lot of magic under the hood. Do we really want it? I'm just asking because I have no idea. Is it best practice using it? is it used everywhere?
To be clear, I'm going to research all that stuff, I just wanted to give some feedback sooner than later :)
Git Tip:
git commit -m "Added domain model #5"
. This will make two things: 1. create a reference to the commit in the Issue. 2. create a reference to the Issue in the commit ( obvious :) ).Thanks for the git tip!
Haskell has records (see, e.g., here: https://mmhaskell.com/blog/2017/12/24/haskell-data-types-in-5-steps). However, the actual record syntax is rather inelegant and clumsy (see https://stackoverflow.com/questions/5367167/haskell-record-syntax). In particular, the name of record fields "leaks"- you cannot have two records with fields of the same name in one module.
For each field, Haskell creates an accessor function. Say we create a record myUser
of type User
. Then, Haskell provides an accessor function:
email = userEmail myUser
The newtype
is also record syntax - for a record with a single element. Naming this element getUserName
is merely for convenience, because Haskell will create the record accessor function getUserName
. A newtype
simply wraps an existing type, and the accessor function can be used to unwrap it. In the code here, I used newtypes to create separate types for separate things, although they are all Text
strings under the hood.
Comments on the design document:
I'm in with both proposals:
So, everyone is an Anonymous user (read-only) by default.
I think that, when users are referenced anywhere, it's better to use foreign keys, so we keep the database normalized.
Regarding timestamps I would suggest two fields:
createdAt: Set the timestamp once on creation
updatedAt: Set the timestamp every time any data (not the referenced data) is updated. For instance, if a tag is added/removed, the updatedAt
is updated, but if the referenced author changes his or her avatar, nothing is updated, because it is a relation.
Delete: I would skip this part for now, because this implies that we have authenticated, which we decided not to do it for now.
If comments are owned by registered users, I would skip comments for now, or allow Anonymous users to comment.
Tags are very personal, I would let every user have its own list of tags.
Because of the initial Anonymous users limitation this wouldn't be possible.
Because of the initial Anonymous users limitation this wouldn't be possible.
/articles
, if we click on an article we would load /articles/{slug}
, and so on.Re: Authentication I di think we need users and roles from the start, for otherwise attributing input to certain users does not make sense. With "skipping authentication", I mean that we do not need to implement login and cryptographic sessions. Nevertheless, we need some way to tell the system which user issuing a request. The simplest form would be to transmit a user name in plain-text as part of the HTTP authentication header.
Re: Endpoints The ReST API is completely specified here: https://github.com/gothinkster/realworld/tree/master/api If we want to be compatible with any of the available frontends (https://github.com/gothinkster/realworld), we need to implement this interface. We can add resources, of course. But I'd say we would try to get the existing spec implemented first. The spec is not 100% complete and consistent, though. I propose we start with a subset (users and articles), and then build it from there.
Re: Data normalisation
I am all for a properly normalised database design. But the database schema does not need to correspond to our domain model. In my view, the two are completely separate things. The domain model should be comfortable to work with, and make use of all the neat Haskell type safety. Database types are not as flexible, so we need a mapper in between.
Main example: I opt for separate DB relations for users and articles. But the Article
domain type can contain a User
subtype, because the link is 1:1 and the data is always used jointly. Thus, it would be the job of the DB access module to do a JOIN. The domain model should not be concerned with the logical data model in the DB.
Re: Multithreading. I think any sort of web application will automatically be multithreaded - each incoming HTTP request is served in a separate thread. The simplest the of application has services, where each service simply loads some data, processes it, and maybe updates the DB or returns data to the frontend. Because multiple service calls might come in at the same time that concern the same piece of data in the DB, we need to be clear on how to handle this type of concurrency: Do we lock the entire table (not a very performant option)? Do we let the DB do consistency checks upon write, with the risk to inform a user that the data he or she submitted conflicts with an update of another user at the same time? both are DB-based consistency mechanisms. Alternatively, we could keep data in main memory and handle concurrency there; e.g., via Haskell's Software Transactional Memory (STM) feature. I strongly opt for the simplest way possible: Do optimistic locking on the DB and let the DB fail a transaction in the unlikely scenario that two users in two threads want to manipulate the same table row. This is - I suppose - how most web applications handle it.
Re: Transactions I don't know anything yet about Haskell DB frameworks. Let's just wait until we get there. As long as we keep domain model and DB model separate, we do not need to care about the DB interface right now, except for references/foreign keys.
Encapsulating domain types to me seems to be the most important aspect of domain modelling. I highly recommend these two articles:
To be more concrete, look at the following issue: An article has a title and a slug. The API specification states that the slug should derive from the title, and should change whenever the title changes. A simple slug would take the title, make it all lowercase, and replace whitespace by dashes, like so: "A Standard Article Title" --> a-standard-article-title. There are lots of things that can go wrong if we use type Text
for the title: It can be arbitrarily long, it can contain arbitrary unicode characters, it can have leading and trailing whitespace, etc. We can validate that this is not the case when we obtain a title from the API, but we cannot guarantee it later on. So, does the slug-generation function simply assume that all prerequisite hold? Or does it check a second time? If it does check by itself, what happens if a constrained is violated? Throw an exception or return a Maybe
value? How should upstream code handle the Maybe
value? I think it is much better to encode the invariants (maximum length, not leading whitespace, etc.) into the Title
type. Then, all functions that rely on this type do not need to make assumptions and do ad-hoc validation. Instead, we push validation to the input boundary once and for all - we "parse" instead of validating. Yet, to do so, we must define our own Title
type and ensure that it cannot be constructed with illegal content.
The same holds true for other basic types. For me, this is the key difference in domain modelling compared with other programming languages. In Haskell, we can "offload" many such invariants to the type system, which we would otherwise need to check and handle over and over again, at the cost of writing those types initially.
Ok, so to recap:
Regarding domain modelling I'm with you, I like the "making impossible states impossible" philosophy and all. We should aim for that!
I think that the compromise here is:
So far, we take an Article
and a Comment
to be two separate things. Ar they? Or do they have enough things in common that we can factor out? Both are some text written by some author. Is this enough to create a BlogContent
type that handels the text and author link, and then wrap it up in a surrounding Article
and Comment
types. What would we gain from this overhead?
Re: Tags
From the API specification, derive an initial draft of the domain data model. Sketch it. List important design decisions to take.