Closed rmetzler closed 11 years ago
Yes. I realised that already. The search works currently with ElasticSearch. There is room for improvement. But it's currently not the most urgent task. CocoaPods, Bower and BitBucket is more urgent.
the problem I see is, when we are telling everyone that we support CocoaPods and the can't actually find a cocoapods package. That's why I said it.
I see the problem. I did some research and I understand now why we get the search results we have currently. But I still don't know how to change it. The problem is that elasticsearch is to smart. The ElasticSearch analyzers are looking for words. They even have a dictionaries for different languages to recognise words and word families. That is very useful for full text search. If you search in a full text for blueberry the will match "blueberry" and "blueberries" because it's semantic the same. But they will not match "blue" because it has a different meaning. For our package name search this is kind of over engineered. I guess we need a simpler matching.
thanks for explaining.
I'll put that in the back of my head and maybe I'll come up with a solution.
Is there another use case where we use ElasticSearch?
We use elasticsearch for the package search mainly. And in the project in the tab "collaborators" the autocomplete for usernames works via ES. I am currently digging deeper into ES. Will write some tests and play around with ngrams in ES. Maybe I can fix the search today.
:+1:
Well, for Richard issue there's quick fix: just add asteriks to search query as autocompletion does ab
-> ab*
(as example)[http://www.versioneye.com/search?lang=%2C&q=500*&g=]
But correct solutions is to use multiple queries and add weights to them - 1exact match, another partial match and another with greedy *
selector. Tuning tokenizers is just partial solution: it fixes one problem and adds new.
If there is nothing found then we should add the greedy operator to all keywords and search again.
Also, I recognized that the feedback dialog only shows up if there is nothing found. There might be cases where you have 1 or 2 libraries that match the keywords but it's not what you are looking for.
This "*" doesn't help. If I add that as default to all queries than the search results even become worst. Because then somehow "Hibernate" is not anymore the perfect match for "Hibernate". then the first result is "spring-hibernate".
Somehow ElasticSearch should be easier to understand and to configure.
That's why combined queries are built in elasticsearch - you dont need fallback function to check size of results.
I will check out combined queries. But with ngrams I don't need to check size of results either. It's just a configuration how a single word should separated in smaller character groups.
This "*" doesn't help. If I add that as default to all queries than the search results even become worst. Because then somehow "Hibernate" is not anymore the perfect match for "Hibernate". then the first result is "spring-hibernate".
that's why I suggested using it as a fallback instead of default.
It works now like expected!
I configured 2 new analyzers. The trick is that we are using a different analyzer for indexing then for searching. At the index time the "ngram_name" analyser is doing the job. It creates many engrams to each product name. For "Hibernate" it creates for example:
hib
hibe
hiber
hibern
and so on. If we search now for "Hiberna" it will have a perfect match and we get the prod_key from "Hibernate" back.
Now I need to re create the whole index on production.
The last commit is boosting on follower numbers. If you search now for "spring" your will get as first result "spring-core" because it has many followers. The 2nd result will be "spring" because it's a perfect match, but doesn't have any followers.
is it possible to order by used_count
or similar?
The idea behind a search engine is that you don't use order by
. The score is based on similarity.
We can think about how to use used_count
to improve the search results. But not this week. That will not bring us more users. We have to focus on growth hacking!
I just wanted to say, if you search for spring
and there is a project that 5 artifacts reference (currently number 1) and a project that more than 5.000 artifacts reference (currently number 4 in search results), than the one with 5.000 should be first.
http://www.versioneye.com/search?q=spring
User growth = improving user experience ;-)
Yes. I know what you mean. But we can even have a better search than Google, if nobody knows about it we have to shut down in 4.5 months. The product is good enough! We have to be more active on Twitter, HN & Reddit. We need more users!
I tried to search for some CocoaPod specs I know exist.
My problem is, search seems to work like this:
So if I want to find ABTest and try the keywords
ab
ortest
orab-test
I don't find anything. It actually has to beabtest
The same goes for 500px-ios-api