src-d / gitbase

SQL interface to git repositories, written in Go. https://docs.sourced.tech/gitbase
Apache License 2.0
2.06k stars 124 forks source link

Friction log on gitbase #303

Closed campoy closed 6 years ago

campoy commented 6 years ago

Hi,

I gave gitbase a try today and took notes on what works and what doesn't.

Installation

First of all I gotta say I'm super happy it's so easy to install.

I was a bit confused by the error message from go get, since it requires make dependencies first. It seems this is due to bblfsh requiring some extra installation steps due to cgo. There's probably some way to improve this step, but it's not really concerning.

What it would be good is to ask people to download the source code without compiling it, to avoid the error altogether. Then make dependencies also install the package.

I sent #304 to fix that problem.

Operation

I was very happy to simply run gitbase server -v -g $GOPATH/src/github.com/campoy/ and see that it simply ran. Unfortunately the last log message one sees when the server is running says:

INFO[0000] starting server

This made me think the server was still starting. It would be much more clear if instead it said something like server started and listening to 0.0.0.0:1234. I create #305 to assess this issue.

Once the server started I connected by using mysql -q -u root -h 127.0.0.1 as recommended. Everything worked 🎉

Exploring data with SQL

My next step was to list the tables, obviously.

mysql> show tables;
+-------+
| table |
+-------+
| blobs |
| commit_blobs |
| commit_trees |
| commits |
| files |
| ref_commits |
| refs  |
| remotes |
| repositories |
| tree_entries |
+-------+
10 rows in set (0.00 sec)

This was great and fast. Next thing I always do: describe one of the tables, let's do it with repositories.

mysql> describe repositories;
ERROR 1105 (HY000): unknown error: unsupported syntax: &sqlparser.OtherRead{}

Apparently show is not supported by vitesse, so we don't support it either. I commented on #49 to see why this was decided. It might be time to reconsider.

Another side effect of not supporting SHOW is that some libraries will simply not work. For instance, I tried to install pymysql to be able to use gitbase from a jupyter notebook.

Unfortunately this doesn't work either:

(pymysql.err.InternalError) (1105, 'unknown error: unsupported feature: SHOW variables') [SQL: "SHOW VARIABLES LIKE 'sql_mode'"] (Background on this error at: http://sqlalche.me/e/2j85)

Performance

Last but not least I decided to do something as simple as counting the number of commits in my own repos and order them decreasingly.

mysql> select repository_id, count(*) from commits group by repository_id;
+---------------+----------+
| repository_id | COUNT(*) |
+---------------+----------+
| /Users/francesc/src/github.com/campoy/talksapptest |        2 |
| /Users/francesc/src/github.com/campoy/apiai |       19 |
| /Users/francesc/src/github.com/campoy/groto |       58 |
| /Users/francesc/src/github.com/campoy/jsonenums |       45 |
| /Users/francesc/src/github.com/campoy/notes |        4 |
| /Users/francesc/src/github.com/campoy/piaas |      112 |
...
+---------------+----------+
46 rows in set (33.64 sec)

There's two main problems here: the first one is how it took over 30 seconds to count only 46 repositories. I wondered how many commits there were in my repos, so again I ran this:

mysql> select count(*) from commits;
+----------+
| COUNT(*) |
+----------+
|     1813 |
+----------+
1 row in set (32.46 sec)

So it takes over 30 seconds just to count 1813 commits. This seems wrong.

The second problem with the request above is that order by seems to be ignored. Which brings me to the last section.

What's available?

Standard SQL is a huge language and it's probably fine that we don't implement it completely. But in just a bit playing with gitbase I was impacted by the lack of:

What other features are we not implementing? There should be a dashboard documenting what exactly we support. Maybe even it could be something that is shown in gitbase version?

I created #306 to track this part.

Conclusion

This is a super exciting project, really great job! Looking forward to using it in production, specially as a hosted instance to provide easy exploration of Public Git Archive!

erizocosmico commented 6 years ago

ORDER BY is implemented and should be working. Can you provide any specific query that does not work with ORDER BY?

campoy commented 6 years ago

I'm pretty sure I did select repository_id, count(*) from commits group by repository_id order by 2; and it didn't work.

mysql> select repository_id, count(*) from commits group by repository_id order by 2;
+---------------+----------+
| repository_id | COUNT(*) |
+---------------+----------+
| /Users/francesc/src/github.com/campoy/apiai |       19 |
| /Users/francesc/src/github.com/campoy/golang-driver |       32 |
| /Users/francesc/src/github.com/campoy/sms-tools |      442 |
| /Users/francesc/src/github.com/campoy/coursera-ml |       33 |
| /Users/francesc/src/github.com/campoy/grpc.rocks |       12 |
| /Users/francesc/src/github.com/campoy/goboard |        1 |
| /Users/francesc/src/github.com/campoy/notes |        4 |
| /Users/francesc/src/github.com/campoy/flite |        1 |
| /Users/francesc/src/github.com/campoy/jsonenums |       45 |
| /Users/francesc/src/github.com/campoy/go-from-scratch |        4 |
| /Users/francesc/src/github.com/campoy/embedmd |      137 |
| /Users/francesc/src/github.com/campoy/groto |       58 |
| /Users/francesc/src/github.com/campoy/justforfunc |      125 |
| /Users/francesc/src/github.com/campoy/go-tooling-workshop |      132 |
| /Users/francesc/src/github.com/campoy/golang-plugins |       11 |
| /Users/francesc/src/github.com/campoy/goodgopher |       19 |
| /Users/francesc/src/github.com/campoy/flappy-gopher |       26 |
| /Users/francesc/src/github.com/campoy/svg-badge |       25 |
| /Users/francesc/src/github.com/campoy/whispering-gophers.bak |        3 |
| /Users/francesc/src/github.com/campoy/mat |        3 |
| /Users/francesc/src/github.com/campoy/golang-groups |       30 |
| /Users/francesc/src/github.com/campoy/mandelbrot |       18 |
| /Users/francesc/src/github.com/campoy/audio |       17 |
| /Users/francesc/src/github.com/campoy/todo |       46 |
| /Users/francesc/src/github.com/campoy/gotalks |       11 |
| /Users/francesc/src/github.com/campoy/openCL |        1 |
| /Users/francesc/src/github.com/campoy/tools |       87 |
| /Users/francesc/src/github.com/campoy/whispering-gophers |        4 |
| /Users/francesc/src/github.com/campoy/goml |       14 |
| /Users/francesc/src/github.com/campoy/samples |       18 |
| /Users/francesc/src/github.com/campoy/talksapptest |        2 |
| /Users/francesc/src/github.com/campoy/narrower |        2 |
| /Users/francesc/src/github.com/campoy/campoy.cat |       65 |
| /Users/francesc/src/github.com/campoy/go-web-workshop |       88 |
| /Users/francesc/src/github.com/campoy/mandarin |       10 |
| /Users/francesc/src/github.com/campoy/talks |       15 |
| /Users/francesc/src/github.com/campoy/podcasts |        3 |
| /Users/francesc/src/github.com/campoy/articles |        2 |
| /Users/francesc/src/github.com/campoy/onboard |        5 |
| /Users/francesc/src/github.com/campoy/ml-coursera |        3 |
| /Users/francesc/src/github.com/campoy/ud859-master-go |       58 |
| /Users/francesc/src/github.com/campoy/piaas |      112 |
| /Users/francesc/src/github.com/campoy/podcast-to-youtube |       33 |
| /Users/francesc/src/github.com/campoy/mandelbrot.bak |       16 |
| /Users/francesc/src/github.com/campoy/unique |       25 |
| /Users/francesc/src/github.com/campoy/query-it |       12 |
+---------------+----------+
46 rows in set (36.54 sec)

I also tried to order by the column after naming it, but it didn't work either:

select repository_id, count(*) as n from commits group by repository_id order by n;
ERROR 1105 (HY000): unknown error: column "n" could not be found in any table in scope
erizocosmico commented 6 years ago

Closing this, as it's been superseded by a few other issues (only https://github.com/src-d/go-mysql-server/issues/228 pending right now)