sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.28k forks source link

Allow searches on sourcegraph.com without `repo` keyword #4958

Closed ijt closed 5 years ago

ijt commented 5 years ago

Currently sourcegraph.com doesn't show anything useful if you do a simple search without specifying a repo. For example, say we want to see an example of how to use "d3.selectAll". Currently sourcegraph.com gives this result:

Screen Shot 2019-07-23 at 14 19 02

It would be more useful if this search would show some examples of using this function in some popular repositories.

Here are some steps to get there:

ijt commented 5 years ago

Here is a first attempt at a query to order the repositories based on information already present in the repo table. The idea is to order descending by the time elapsed between the creation of the repo and its last commit. That favors long-lived repos:

sg=# select uri, created_at, updated_at from repo where not fork and created_at >= '2000-01-01 00:00:00+00' and updated_at is not null order by updated_at - created_at desc, length(uri) asc limit 40;
                       uri                        |          created_at           |          updated_at           
--------------------------------------------------+-------------------------------+-------------------------------
 github.com/Financial-Times/aggregate-healthcheck | 2016-02-17 10:13:15.031351+00 | 2019-07-10 02:29:20.762954+00
 github.com/unknwon/the-way-to-go_ZH_CN           | 2016-03-09 15:46:45.371299+00 | 2019-07-22 10:58:11.08837+00
 github.com/gorilla/handlers                      | 2016-03-11 22:44:34.672361+00 | 2019-07-23 09:42:08.684536+00
 github.com/sourcegraph/srclib                    | 2016-02-17 03:32:51.463969+00 | 2019-06-29 01:00:13.743766+00
 github.com/intercom/intercom-go                  | 2016-03-09 16:26:33.57322+00  | 2019-07-20 13:13:55.074747+00
 github.com/segmentio/go-prompt                   | 2016-03-15 19:30:33.813837+00 | 2019-07-23 18:57:09.05524+00
 github.com/caddyserver/caddy                     | 2016-02-25 18:46:38.438014+00 | 2019-07-02 19:01:15.123117+00
 github.com/Redth/PushSharp                       | 2016-03-18 11:18:10.861609+00 | 2019-07-23 02:08:25.703996+00
 github.com/RehabMan/OS-X-Voodoo-PS2-Controller   | 2016-03-12 23:59:12.071534+00 | 2019-07-17 09:54:57.882249+00
 github.com/gorilla/pat                           | 2016-03-09 15:41:51.70236+00  | 2019-07-13 18:12:26.135696+00
 github.com/sourcegraph/annotate                  | 2016-02-17 08:12:15.992987+00 | 2019-06-21 03:54:10.30054+00
 github.com/square/leakcanary                     | 2016-03-21 18:45:31.841576+00 | 2019-07-23 13:25:30.272823+00
 github.com/google/gxui                           | 2016-03-09 15:47:00.773795+00 | 2019-07-11 09:15:44.556037+00
 github.com/sourcegraph/srclib-docker             | 2016-02-17 08:12:24.361623+00 | 2019-06-19 19:22:30.702075+00
 github.com/joeshaw/envdecode                     | 2016-02-25 20:41:06.675202+00 | 2019-06-27 13:51:16.429081+00
 github.com/Caliburn-Micro/Caliburn.Micro         | 2016-03-18 15:05:34.098663+00 | 2019-07-18 14:15:52.604164+00
 github.com/vlucas/phpdotenv                      | 2016-03-22 03:08:54.826629+00 | 2019-07-21 09:17:43.381621+00
 github.com/h5bp/html5-boilerplate                | 2016-03-21 18:50:32.472869+00 | 2019-07-20 13:18:12.525006+00
 github.com/quartznet/quartznet                   | 2016-03-18 15:10:12.226925+00 | 2019-07-17 07:22:25.731509+00
 github.com/angular-ui/bootstrap                  | 2016-03-21 18:50:49.597421+00 | 2019-07-20 07:57:53.391567+00
 github.com/moq/moq4                              | 2016-03-18 12:16:39.087041+00 | 2019-07-16 12:16:03.736697+00
 github.com/clojure/clojure                       | 2016-03-21 18:45:39.691456+00 | 2019-07-19 16:29:45.94775+00
 github.com/WP-API/WP-API                         | 2016-03-22 03:08:50.934113+00 | 2019-07-19 22:49:01.483873+00
 github.com/zenorocha/clipboard.js                | 2016-03-21 18:50:36.971436+00 | 2019-07-19 08:06:26.994637+00
 github.com/bolt/bolt                             | 2016-03-22 03:09:22.124677+00 | 2019-07-19 03:06:12.916513+00
 github.com/autofac/Autofac                       | 2016-03-18 09:59:22.343739+00 | 2019-07-15 06:45:35.145985+00
 github.com/ReactiveCocoa/ReactiveCocoa           | 2016-03-22 14:14:07.976028+00 | 2019-07-19 02:41:39.834086+00
 github.com/FriendsOfPHP/Goutte                   | 2016-03-22 03:09:13.922934+00 | 2019-07-18 07:11:47.89508+00
 github.com/gorilla/context                       | 2016-03-11 19:35:49.503091+00 | 2019-07-06 23:39:05.51818+00
 github.com/SVProgressHUD/SVProgressHUD           | 2016-03-22 14:14:10.718432+00 | 2019-07-17 07:43:44.937553+00
 github.com/gorilla/mux                           | 2016-03-04 06:30:33.74973+00  | 2019-06-28 19:27:16.849543+00
 github.com/mailgun/godebug                       | 2016-03-09 17:24:52.213674+00 | 2019-07-03 04:24:40.655913+00
 github.com/siddontang/ledisdb                    | 2016-03-16 03:31:22.919918+00 | 2019-07-09 05:48:53.687457+00
 github.com/EventStore/EventStore                 | 2016-03-18 13:01:08.838797+00 | 2019-07-11 12:01:33.068471+00
 github.com/yiisoft/yii2                          | 2016-03-22 03:08:58.366864+00 | 2019-07-15 01:46:31.945273+00
 github.com/photonstorm/phaser                    | 2016-03-21 18:50:32.937426+00 | 2019-07-14 16:36:00.438966+00
 github.com/sourcegraph/vcsstore                  | 2016-02-17 08:12:28.65143+00  | 2019-06-11 04:45:51.513217+00
 github.com/walkor/Workerman                      | 2016-03-22 03:08:55.391829+00 | 2019-07-14 17:35:39.246456+00
 github.com/fiorix/freegeoip                      | 2016-03-09 17:24:21.759521+00 | 2019-07-01 21:00:42.782485+00
 github.com/getlantern/lantern                    | 2016-03-09 15:41:41.04518+00  | 2019-06-30 10:44:08.714888+00
(40 rows)

Time: 748.220 ms

Some popular repositories show up this way.

sg=# create temp table toprepos as select uri, created_at, updated_at from repo where not fork and created_at >= '2000-01-01 00:00:00+00' and updated_at is not null order by updated_at - created_at desc limit 20000;                                                                                                                                              
SELECT 20000                                                                                                                                                                      
Time: 981.875 ms                                                                                                                                                                  
sg=# select uri from toprepos where uri like '%/vim/vim';                                                                                                                         
        uri                                                                                                                                                                       
--------------------                                                                                                                                                              
 github.com/vim/vim                                                                                                                                                               
(1 row)                                                                                                                                                                           

Time: 20.985 ms                                                                                                                                                                   
sg=# select uri from toprepos where uri like '%/torvalds/linux';                                                                                                                  
            uri                                                                                                                                                                   
---------------------------                                                                                                                                                       
 github.com/torvalds/linux                                                                                                                                                        
(1 row)                                                                                                                                                                           

Time: 29.581 ms                                                                                                                                                                   

Funnily, sourcegraph/sourcegraph doesn't show up, but that's only because its created_at field is artificially recent.

sg=# select uri from toprepos where uri like '%/sourcegraph/sourcegraph';                                                                                                         
 uri                                                                                                                                                                              
-----                                                                                                                                                                             
(0 rows)  

I don't think we need to worry about that though. This query is probably good enough to get started and start giving some results.

ijt commented 5 years ago

There is still some work to do on this. Mainly we need to get zoekt running on sourcegraph.com.

ijt commented 5 years ago

I ran github.com/ijt/reposize on the first 1000 repos and found that they take up about 15G of space. Based on the heuristic in scale.md, that means we should allocate about 45G for the zoekt-webserver pod. The zoekt-indexserver pod should get about 3G since that's about as big as the repositories get (for example github.com/BabylonJS/Babylon.js).

ijt commented 5 years ago

It's working now:

Screen Shot 2019-08-07 at 11 34 47 AM