sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.11k stars 1.27k forks source link

Poor ranking on dotcom for a react-router query #45800

Open sqs opened 1 year ago

sqs commented 1 year ago

On Sourcegraph.com, searching for repo:react-router isactive shows irrelevant results.

image

stefanhengl commented 1 year ago

Small update to document my findings so far:

Looking at the scores one can see why the remix-run/react-router didn't do better.

  1. The query doesn't match a symbol that ctags recognizes => the match score (912) is really low (top score is 8610) image
  2. The PageRank of the document is not good either.
    ... ranks: [1 1 1 1 0.5026413004305995 1 1], (0,0), (4,1), (51,3),
                    ^^^^^^^^^^^^^^^^^^
  3. The star count is now just a small input to the score whereas before PageRank it was dominating everything.

However, looking at the match in remix-run/react-router, it should have been recognized as symbol. Once a match lies within a symbol range, the match gets a huge boost of the score.

I am going to look at ctags to find out why we don't recognize isActive as symbol in App.tsx.

stefanhengl commented 1 year ago

Update: I ran universal-ctags including our custom tsx language definition from go-ctags on App.tsx: isActive is not recognized as Symbol although we have regex which targets local variables.


→  universal-ctags --langdef=tsx --langmap=tsx:.tsx --regex-tsx="/^[ \t]*(export[ \t]+([a-z]+[ \t]+)?)?class[ \t]+([a-zA-Z0-9_$]+)/\3/c,class/" --regex-tsx="/^[ \t]*(declare[ \t]+)?namespace[ \t]+([a-zA-Z0-9_$]+)/\2/n,module/" --regex-tsx="/^[ \t]*(export[ \t]
+)?module[ \t]+([a-zA-Z0-9_$]+)/\2/n,module/" --regex-tsx="/^[ \t]*(export[ \t]+)?(default[ \t]+)?(async[ \t]+)?function[ \t]+([a-zA-Z0-9_$]+)/\4/f,function/" --regex-tsx="/^[ \t]*export[ \t]+(var|let|const)[ \t]+([a-zA-Z0-9_$]+)/\2/v,variable/" --regex-tsx="/
^[ \t]*(var|let|const)[ \t]+([a-zA-Z0-9_$]+)[ \t]*=[ \t]*function[ \t]*[*]?[ \t]*\(\)/\2/v,variable/" --regex-tsx="/^[ \t]*(export[ \t]+)?(public|protected|private)[ \t]+(static[ \t]+)?(abstract[ \t]+)?(((get|set)[ \t]+)|(async[ \t]+[*]*[ \t]*))?([a-zA-Z1-9_$]
+)/\9/m,member/" --regex-tsx="/^[ \t]*(export[ \t]+)?interface[ \t]+([a-zA-Z0-9_$]+)/\2/i,interface/" --regex-tsx="/^[ \t]*(export[ \t]+)?type[ \t]+([a-zA-Z0-9_$]+)/\2/t,type/" --regex-tsx="/^[ \t]*(export[ \t]+)?enum[ \t]+([a-zA-Z0-9_$]+)/\2/e,enum/" -f - App
.tsx

App     App.tsx /^export default function App() {$/;"   f
BrandLink       App.tsx /^function BrandLink({ brand, children, ...props }: BrandLinkProps) {$/;"       f
BrandLinkProps  App.tsx /^interface BrandLinkProps extends Omit<LinkProps, "to"> {$/;"  i
Layout  App.tsx /^function Layout() {$/;"       f
NoMatch App.tsx /^function NoMatch() {$/;"      f
SneakerGrid     App.tsx /^function SneakerGrid() {$/;"  f
SneakerView     App.tsx /^function SneakerView() {$/;"  f```
stefanhengl commented 1 year ago

The relevant regexp doesn't match because isActive isn't exported

--regex-tsx=/^[ \t]*export[ \t]+(var|let|const)[ \t]+([a-zA-Z0-9_$]+)/\2/v,variable/

The question is, does it makes sense to only recognize exported variables. I will compare the tsx regexp-rules to other languages to get a better picture.