Competitive design goals?

saurabhnanda commented 7 years ago

warning this is an existential question!

Looking at Hackage, it seems that Opaleye 's first release was on 1 Dec 2014. Around that time Persistent was at 2.1.1.

Does this mean Persistent was a fairly mature library by the time Opalaye was started? If yes, then the broad question is: what are the gaps/issues in Persistent that Opaleye is trying to fill? What problems does Persistent not solve well, and possibly cannot solve well in the future (given core design principles), that Opaleye is better positioned to solve?

tomjaguarpaw commented 7 years ago

Persistent, Esqueleto and Groundhog were all fairly mature at the time Opaleye was first released, I think.

I hope the whole essence of Opaleye can be summarised by the slogan that appears in the README:

You need Opaleye if you want to use Haskell to write typesafe and composable code to query a Postgres database.

Typesafe means that a well-typed Opaleye query should not fail at runtime. Composable means that the meaning of a query is determined in a well-defined way from the meaning of its subqueries. How close do other Haskell database libraries come to being typesafe and composable?

Persistent: This is not even a query language. It can't even do joins. I believe its goal is to be a general API for fetching records from a database.
Esqueleto: This is a query language implemented on top of Persistent.
- It is not typesafe: https://github.com/prowdsponsor/esqueleto/issues/41
- It is not composable: https://github.com/prowdsponsor/esqueleto/issues/40
Groundhog: I admit I don't really understand Groundhog, but I suspect it is neither as typesafe nor as composable as Opaleye.

xldenis commented 7 years ago

I like these goals, but as was mentioned on the reddit thread about Opaleye and documented in issue #233. It is very important to not sacrifice 'usability' (which in itself needs defining) to achieve this. What's the point of a typesafe query language if no one uses it 😉 . To that end I think it could be valuable to have a 'state of Opaleye' discussion to see what the low hanging fruits are!

I've started building a comprehensive list of the 'missing bits' of psql so that work can be organized for that. Maybe there are some refactorings / restructurings that we can perform as a result / in anticipation of these changes?

Personally, I'd like to see Ppaleye be more than just a safe composable query language. I'd like for there to be the supporting tooling that lets you safely build, interact with and manage database backed applications. This like migrations, validating database schema on boot, etc... All play a role in this.

tomjaguarpaw commented 7 years ago

I sincerely hope Opaleye is usable! I concede that there is work to be done to explain clearly how to use Opaleye effectively. But that said, I believe Opaleye is as usable as Haskell.

I'm very keen to have a "state of Opaleye" discussion and see your comprehensive list of "missing bits". Could you open a PR with your list in the Doc/ directory?

I see no reason why Opaleye couldn't be used for tooling around migrations etc. as you suggest. It seems a natural fit. The problem is getting enough person-power to work on these projects!

xldenis commented 7 years ago

I sincerely hope Opaleye is usable!

I think its usable but I'd like to see it used. I think part of that will take the form of adding syntactic sugar or building some helper api.

The problem is getting enough person-power to work on these projects!

That's why I've started trying to help out! But agreed building a community and ecosystem is important for building a user base but it's also a chicken and egg problem :P

tomjaguarpaw commented 7 years ago

I sincerely hope Opaleye is usable!

I think its usable but I'd like to see it used. I think part of that will take the form of adding syntactic sugar or building some helper api.

I'm of very much the same opinion. Glad to have you on board! Your contributions so far have been great.

bergmark commented 7 years ago

Put no significance in version numbers, I think you can not read anything useful from it. Look at activity, friendliness of maintainers, and decide what other factors are important to you. Like you might deduct there is no one true answer. Persistent has been around a lot longer than opaleye, but they are completely different beasts. Persistent has the frameworky "do things this way and everything is great" principles while opaleye is "do things however you'd like but only if you like postgres". Both are great libraries if they fit your needs.

I spent a lot of time investigating these libraries for Silk and, for us, neither persistent nor groundhog were flexible enough to fit our needs. opaleye was the only one that I thought would be practical for us (and I think it worked out great!).

saurabhnanda commented 7 years ago

For every library that tries to put a "type-safe" wrapper on top of SQL (Opaleye included) what is the advantage of a custom DSL vis-a-vis the following:

Write regular SQL (or SQL with very minor usability enhancements)
Convert to AST using postgres query parser
Attach additional type information to AST depending upon additional type information provided by Haskell/domain. Eg tenants.id is a different type from products.id and shouldn't be compared.
Typecheck the AST

Doesn't this ensure that you don't have to reinvent the wheel in terms of supporting SQL features?

saurabhnanda commented 7 years ago

You need Opaleye if you want to use Haskell to write typesafe and composable code to query a Postgres database.

Would it be possible to elucidate this point either independently, or in comparison with other DB libraries. For example:

What is meant by type-safety in this context? What kind of incorrect queries does Opaleye prevent you from writing? Does this have a downside of not being able to express some valid SQL as well? What about type information that is over and above the SQL layer? Eg TenantID vs ProductID?
What is meant by composability in this context? Ability to attach where clauses after the query has been formed? Ability to add/remove columns from being selected? Some real-life examples please.

saurabhnanda commented 7 years ago

I understand that Opaleye has the best story around aggregation right now. How can one practically use this? Would it be possible to write an analytics dashboard in Opaleye, which needs the following:

Where exists / where not exists
Postgres window functions
Grouping
Postgres CASE and IF
generate_series
date_trunc
Count distinct
rank

(This is from an existing piece of code btw, which is prone to breakage every time someone changes the underlying table schema and forgets to update the analytics query).

xldenis commented 7 years ago

Write regular SQL (or SQL with very minor usability enhancements)

I have problems with this step honestly. SQL is not a great or even a good language. It encodes a powerful paradigm (relational algebra) but the language itself is awful. I want a dsl that makes it simpler to write query code and also ensures that its safe / composable.

side note: wouldn't your dsl have to be typechecked at runtime?

What is meant by composability in this context? Ability to attach where clauses after the query has been formed? Ability to add/remove columns from being selected? Some real-life examples please.

To me its really both of those. I'd like to say if I have a query select A, B from T and select * from T where id < 5000 I'd like to be able to combine them into select A,B from T where id < 5000.

Frankly a good, non-typesafe example is ActiveRecord from rails. It has a lot of issues but composing queries is a very heavily used feature.

Would it be possible to write an analytics dashboard in Opaleye, which needs the following:

No 😞 , afaik opaleye is missing some of those features (window functions, case). This is why I'm building a list of 'completeness'. Knowing whats missing and deciding what bits are important is core.

saurabhnanda commented 7 years ago

To me its really both of those. I'd like to say if I have a query select A, B from T and select * from T where id < 5000 I'd like to be able to combine them into select A,B from T where id < 5000.

This brings up an interesting point. Is "composability" or "composition" of SQL queries well-defined? Is it a warm-fuzzy feeling based on real-life use-cases, or is there a mathematical definition of how two relational queries should be composed?

Frankly a good, non-typesafe example is ActiveRecord from rails. It has a lot of issues but composing queries is a very heavily used feature.

ActiveRecord, while extremely useful, does not try to be mathematically consistent. There are a lot of hacks and inconsistencies. On the other hand, that does not seem to be how Haskell approaches engineering. Function composition has laws in Haskell. Monads have laws. Monoids have laws. What are the laws of composing SQL queries?

In your example, where is it defined the select * and select A, B should compose to select A, B? Why not select *? Why not select A, B, *?

saurabhnanda commented 7 years ago

I have problems with this step honestly. SQL is not a great or even a good language. It encodes a powerful paradigm (relational algebra) but the language itself is awful. I want a dsl that makes it simpler to write query code and also ensures that its safe / composable.

No 😞 , afaik opaleye is missing some of those features (window functions, case). This is why I'm building a list of 'completeness'. Knowing whats missing and deciding what bits are important is core.

Analyze the two situations together. Is there any advantage in not depending upon the Postgres query-parser here? If there is sufficiently good reason for NOT using SQL, can we still not devise a better query-language that is isomorphic with SQL so that the DSL support is always at-par with whatever the PG query-parser can support?

tomjaguarpaw commented 7 years ago

What are the laws of composing SQL queries?

The laws for SQL may be hard or impossible to state, in no small part due to the confused syntax of SQL. One of the benefts of Opaleye is that you can state laws.

In particular

The applicative laws for Query
The arrow laws for QueryArr (generalising the applicative laws)
The commutativity of the applicative operation on Query (and *** on QueryArr)
The laws for restrict (Idempotence particularly. Not sure if there are others.)
The laws for binary operations (union, intersection, etc.), aggregation, and distinct

tomjaguarpaw commented 7 years ago

Is there any advantage in not depending upon the Postgres query-parser here? If there is sufficiently good reason for NOT using SQL, can we still not devise a better query-language that is isomorphic with SQL so that the DSL support is always at-par with whatever the PG query-parser can support?

Isomorphic with SQL how? I don't see how that means anything different to actually being SQL.

saurabhnanda commented 7 years ago

The applicative laws for Query The arrow laws for QueryArr (generalising the applicative laws) The commutativity of the applicative operation on Query (and *** on QueryArr) The laws for restrict (Idempotence particularly. Not sure if there are others.) The laws for binary operations (union, intersection, etc.), aggregation, and distinct

Do these laws answer the examples I mentioned above:

How should select * and select a,b compose?
How should where x=y and where a=b compose?

Better yet, is there any document which talks about composing Opaleye queries wrt actual use-cases?

saurabhnanda commented 7 years ago

Isomorphic with SQL how? I don't see how that means anything different to actually being SQL.

If this has already been thought about deeply, and discarded, then I don't think I'll be able to come up with an isomorphic DSL either!

On the other hand, does Opaleye DSL need to lag behind the Postgres SQL dialect? Can't that problem be solved by some other method, which was the original intent.

tomjaguarpaw commented 7 years ago

Do these laws answer the examples I mentioned above:

No, because those examples are SQL, not Opaleye.

Better yet, is there any document which talks about composing Opaleye queries wrt actual use-cases?

No, not yet!

tomjaguarpaw commented 7 years ago

does Opaleye DSL need to lag behind the Postgres SQL dialect? Can't that problem be solved by some other method, which was the original intent.

This is a good question. Yes, I think it does have to lag. At the very least one would have to describe how the typing rules of SQL correspond to Haskell types.

xldenis commented 7 years ago

ActiveRecord, while extremely useful, does not try to be mathematically consistent. There are a lot of hacks and inconsistencies. On the other hand, that does not seem to be how Haskell approaches engineering. Function composition has laws in Haskell. Monads have laws. Monoids have laws. What are the laws of composing SQL queries?

Totally agreed, which is why i'm not saying opaleye should be a port of AR into haskell. The point I was making by mentioning AR was about composition / extensibility of queries.

In your example, where is it defined the select and select A, B should compose to select A, B? Why not select ? Why not select A, B, *?

Totally arbitrary, I was just showing examples of what I meant by 'composition'. There is also no requirement of a single method of composition, different operators can compose subqueries for different results.

By the way, I've started work on comparing the dialect of psql which opaleye can express.

https://gist.github.com/xldenis/697c6df4c7ea5225b2823fe18bc21241

For now It's primarily a giant list of operators, query statements, types etc... I make no attempts to decide what opaleye should support.

saurabhnanda commented 7 years ago

Ia the following a good summary of the original question (design goals of Opalaye):

Opalaye aims to be a Haskell-specific, composable, and type-safe relational query language, that has feature parity with SQL. At the time this project was started, there was no other project with similar goals, and definitely no other project which had executed those goals appreciably.

If this is correct, would it be a good idea to have a document which talks about "Opalaye, the query language" juxtaposed against SQL? Would it also make sense to define the formal grammar of this new query language?

saurabhnanda commented 7 years ago

Also, would it be possible to have some examples of how it's hard or ambiguous to conpose two specific SQL queries, and how Opalaye solves this problem.

And just to be sure, whenever we're talking about SQL, are we talking of actual queries written in text format or an AST representing the SQL in Haskell-land?

saurabhnanda commented 7 years ago

I have problems with this step honestly. SQL is not a great or even a good language. It encodes a powerful paradigm (relational algebra) but the language itself is awful. I want a dsl that makes it simpler to write query code and also ensures that its safe / composable.

@tomjaguarpaw do you agree with this comment? It is partly out of this comment and your comment that I have derived the overarching goal of the Opaleye project:

Opalaye aims to be a Haskell-specific, composable, and type-safe relational query language, that has feature parity with SQL. At the time this project was started, there was no other project with similar goals, and definitely no other project which had executed those goals appreciably.

If this is the motivation driving Opaleye, is there any documentation (within Opaleye, or outside), which talks about the disadvantages of SQL as a relational query language. Following from there, is there any document (it would actually be the Opaleye roadmap), which talks about how the Opaleye query language doesn't have those disadvantages? Also, the latter document should also talk about how the Opaleye query language is at feature-parity with SQL (or is explicitly choosing to not implement certain features due to aforementioned disadvantages).

Is this making sense to anyone else?

saurabhnanda commented 7 years ago

As part of my latest quest (which is embodied by this particular issue), I'm digging into LINQ. Any thoughts on how Opaleye's design goals compare to LINQ's?

saurabhnanda commented 7 years ago

By the way, I've started work on comparing the dialect of psql which opaleye can express. https://gist.github.com/xldenis/697c6df4c7ea5225b2823fe18bc21241

@xldenis thanks for compiling this. Aren't the number of "no's" very high?

tomjaguarpaw commented 7 years ago

I have problems with this step honestly. SQL is not a great or even a good language. It encodes a powerful paradigm (relational algebra) but the language itself is awful. I want a dsl that makes it simpler to write query code and also ensures that its safe / composable.

@tomjaguarpaw do you agree with this comment?

Yes, absolutely!

If this is the motivation driving Opaleye, is there any documentation (within Opaleye, or outside), which talks about the disadvantages of SQL as a relational query language. Following from there, is there any document (it would actually be the Opaleye roadmap), which talks about how the Opaleye query language doesn't have those disadvantages?

No and no. There is no such documentation. I agree it would be a good idea to have it.

the latter document should also talk about how the Opaleye query language is at feature-parity with SQL (or is explicitly choosing to not implement certain features due to aforementioned disadvantages).

The only reason there is not feature parity is lack of developer time.

Any thoughts on how Opaleye's design goals compare to LINQ's?

LINQ has to be supported by the language, as far as I know. Opaleye is a pure library. Besides that they probably have similar design goals, but I'm not terribly familiar with LINQ so I can't say for certain.

Aren't the number of "no's" very high?

Yeah absolutely! It takes effort to support new features and I've only got a limited amount of free time!

tomjaguarpaw commented 7 years ago

Here are some ideas about the downsides of SQL.

In SQL you can only abstract at the level of a query, and even that is quite heavy weight because you have to enter it into your database as a view. In Opaleye you can abstract queries with a simple let-binding, and furthermore you can abstract over queries with inputs (QueryArr) and you can also abstract over aggregators and orderings (and a whole host of other things).

tomjaguarpaw commented 7 years ago

And, I suppose, stored procedures are also a very heavyweight way of abstracting functions.

saurabhnanda commented 7 years ago

If this is the motivation driving Opaleye, is there any documentation (within Opaleye, or outside), which talks about the disadvantages of SQL as a relational query language. Following from there, is there any document (it would actually be the Opaleye roadmap), which talks about how the Opaleye query language doesn't have those disadvantages?

No and no. There is no such documentation. I agree it would be a good idea to have it.

Yeah absolutely! It takes effort to support new features and I've only got a limited amount of free time!

@tomjaguarpaw what if I [1] convince you to invest whatever little free time you have, over the next few weeks, in documenting (in detail), your vision for Opaleye instead of fixing bugs or developing new features? Say:

Examples of the problems in SQL. Examples of how Opaleye, currently, solves those problems
Features in Postgres' SQL dialect and corresponding features in Opaleye's query language. Even if a particular feature/syntax is not supported today, envisioning what it will look like, when supported. This will serve as a very clear roadmap for future development.
List of design issues in Opaleye that you wished weren't there
Reasons for why Opaleye's SQL generation is so nested/complex and what can be done to fix it (btw, we just benchmarked this and this seems to be slowing down Opaleye a bit. Can share the benchmarking code to make sure we're not doing something wrong.)

Given the goals of Opaleye, I feel that designing most aspects of the query language up-front, and allowing end-users (us!) to analyse and critique the design is going to have a huge impact on the project [3]. Alternatively, if building a type-safe+composable alternative to SQL is not a green-field problem (which means other people/projects have done it, as well) stealing ideas from other such projects and collecting them in a single project can also be done.

If such a roadmap is made available, it can be used to focus volunteer effort, build summer-of-code projects, and also motivate corporate funding [2]

With this, I will rest my case and not bug you with any more posts to this thread (unless you want me to!)

[1] am I the only crazy one who thinks like this, around here? [2] my company may not be able to fund the development completely, but can jointly fund with other companies who believe in the roadmap/vision. [3] Btw, am I mistaken? Is the design fairly complete? Am I being blind to it?

tomjaguarpaw commented 7 years ago

This is a great plan of action, but

invest whatever little free time you have, over the next few weeks, in documenting (in detail), your vision for Opaleye instead of fixing bugs or developing new features?

I don't have time for anything on Opaleye right now. In fact, having these conversations with you (which are really valuable) takes up about all the time I can afford to spend at the moment.

Is the design fairly complete? Am I being blind to it?

The design is basically complete. It's just a case of adding missing features now.

tomjaguarpaw commented 7 years ago

I did intend to write something like what you describe in DESIGN.md but as you can see, I didn't get very far.

tomjaguarpaw commented 7 years ago

we just benchmarked this and this seems to be slowing down Opaleye a bit

If you want to see how long query generation takes then just generate a string using showSql. No need to run anything on the database.

tomjaguarpaw commented 7 years ago

(Remember though, that you should make sure to force the whole string, probably using deepseq.)

saurabhnanda commented 7 years ago

If you want to see how long query generation takes then just generate a string using showSql. No need to run anything on the database.

We were benchmarking against Persistent. So, found it easier to actually run the SQL using both the libraries and compare.

tomjaguarpaw / haskell-opaleye

Competitive design goals? #245