src-d / gitbase

SQL interface to git repositories, written in Go. https://docs.sourced.tech/gitbase
Apache License 2.0
2.06k stars 124 forks source link

Problem with distinct and order by #976

Closed alexpdp7 closed 4 years ago

alexpdp7 commented 4 years ago

I think there might be a problem with the interaction between ORDER BY and DISTINCT:

MySQL [gitbase]> select distinct repository_id from blobs where blob_hash in ('93ec5b4525363844ddb1981adf1586ebddbc21c1', 'aad34590345310fe813fd1d9eff868afc4cea10c', 'ed82eb69daf806e521840f4320ea80d4fe0af435') order by blob_hash, repository_id;
+-------------------------------------+
| repository_id                       |
+-------------------------------------+
| github.com/src-d/gitbase            |
| github.com/src-d/go-mysql-server    |
| github.com/bblfsh/javascript-driver |
| github.com/bblfsh/python-driver     |
| github.com/bblfsh/ruby-driver       |
| github.com/src-d/enry               |
| github.com/src-d/gitbase            |
| github.com/src-d/go-mysql-server    |
+-------------------------------------+
8 rows in set (0.14 sec)

MySQL [gitbase]> select distinct repository_id from blobs where blob_hash in ('93ec5b4525363844ddb1981adf1586ebddbc21c1', 'aad34590345310fe813fd1d9eff868afc4cea10c', 'ed82eb69daf806e521840f4320ea80d4fe0af435') order by repository_id;
+-------------------------------------+
| repository_id                       |
+-------------------------------------+
| github.com/bblfsh/javascript-driver |
| github.com/bblfsh/python-driver     |
| github.com/bblfsh/ruby-driver       |
| github.com/src-d/enry               |
| github.com/src-d/gitbase            |
| github.com/src-d/go-mysql-server    |
+-------------------------------------+
6 rows in set (0.42 sec)
erizocosmico commented 4 years ago

For the one who takes this: distinct optimizes the case where all projected columns in the distinct are sorted. But it should only do so if the first column in the order by is present