healthonnet / hon-lucene-synonyms

Solr query parser plugin that performs proper query-time synonym expansion.
http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr
149 stars 67 forks source link

Add a threshold for alternative queries #38

Closed rmerizalde closed 7 years ago

rmerizalde commented 10 years ago

We've experienced a coupe site outages because the plugin was trying to create millions of alternative queries. Functionally wise, the plugin was trying to do the right thing.

However, a "bad" synonym and an edge case query ended up in ~400M alternative queries which brought down our cluster.

The solution we are proposing and deploying to our cluster is to add a threshold on how many alternatives queries a query can be expanded to. If the threshold is exceeded we are planning to halt the expansion and use the original query only. A warning will be logged which we plan to use to send an alert to our merchandising and engineering teams.

If anyone has a better suggestion we'll be happy to help creating a patch.

Here is some test data to reproduce a similar issue. Bare with me on this example even though it doesn't make a lot of sense:

i o,i os,ios,io s,io=>i ox,i o,i os, iox,ios,io s,io,io x,ioxs

Query:

[smith - POLARIZED 5 $187.96 $234.95 Item # SMI0905 20% Off Sale -White/Rose Copper/Extra Blue Sensor Mirror, One Size ($187.96) Size? Quantity add to cart FREE SHIPPING on orders over $50* 100% Guaranteed Returns Price Match Guarantee buy with confidence TECH SPECS Frame Material: flexible urethane Helmet Compatible: yes Eyeglass Compatible: no Polarized Lens: yes Ventilation: Vaporator Lens Technology, Porex Filter Grip Strap: yes, silicone Face Size: small to medium Recommended Use: skiing, snowboarding Manufacturer Warranty: lifetime OTHER PEEPS PEEPED product title Smith IO Interchangeable Goggles with Bonus Lens From:$122.47 product title Smith IOX Interchangeable Goggle From:$131.21 product title Smith I/O Recon Goggle From:$519.96 product title Oakley Airbrake Goggle From:$121.00 product title Smith Lago Signature I/O Goggles From:$122.47 Smith I/OS Interchangeable Goggle - Polarized Current Color Available Colors/Styles Smith I/OS Interchangeable Goggle - Polarized White/Rose Copper/Extra Blue Sensor Mirror Detail Pics DESCRIPTION UNDENIABLY SLEEK LOOKS AND RIMLESS INTERCHANGEABILITY. Inspired by fashionable eyewear, designed to expose the truest picture of terrain in front of you, and sized for those with smaller faces, the polarized Smith I/OS Interchangeable Goggle is the luxury sports car of the goggle world. The minimal frame design creates a lightweight, comfortable feel while the easy-to-use (even with gloves on your hands) interchangeable clips allow you to adapt to conditions as quickly as they change. Flexible, minimally designed frame eliminates snow-clog on the lens edges and creates a sleek look that seamlessly fuses with your Smith helmet Carbonic-X lens with TLT Optics has a spherical shape that mimics the shape of your eye to eliminate distortion and open your peripheral view—premium scoutability for lines and possible hazards Quick-release interchangeable lens system works in conjunction with the minimal frame design—use the dual top clips to change the lens and keep up with varied conditions Vaporator Lens Technology—a dual-layer lens with a Porax Filter to keep the moisture out and prevent fogging DriWix dual-layer face foam features a soft backing layer at the frame and a supple fleece lining against your face to absorb sweat and wick it away Silicon backing keeps the adjustable, quick-clip strap stuck to your helmet or hat while the pivoting side clips allow the goggle to move with your face—no pressure points Includes two lenses and a microfiber goggle bag with a separate sleeve for a single lens What do you think of the Smith I/OS Interchangeable Goggle - Polarized? Share a... Write a review Ask a question Share a photo Share a video Hide detailed information YOUR COMMUNITY CONTRIBUTIONS Everything Reviews Photos Videos]

nolanlawson commented 10 years ago

I think the synonyms.bag option may help you out here, since there isn't the same combinatorial explosion of query clauses. Did you try that one?

lujameva commented 10 years ago

Probably synonyms.bag will work, with the drawback that you lose context information. It would be nice if you could keep context information and set a limit on the number of alternate queries generated (under the assumption that after certain number of alternate queries, you won't gain much). I've opened a merge request which adds a limit to the number of alternate queries created.