krisk / Fuse

Lightweight fuzzy-search, in JavaScript
https://fusejs.io/
Apache License 2.0
18.1k stars 766 forks source link

Version 2 only seems to match strings with words of the same length as pattern #77

Closed garth closed 8 years ago

krisk commented 8 years ago

Could you provide an example of this?

garth commented 8 years ago

I'm using fuse in this lib: https://github.com/cerebral/cerebral-module-fuse

Just noticed when I upgraded it to fuse 2.0 that very few search results were being returned, and it seems to be very sensitive the the length of the pattern being the same as the length of word or string being searched.

For the time being I have just reverted back to 1.3.

I don't have specific example code for you, but guess you'll see the issue when you upgrade your home page (http://kiro.me/projects/fuse.html) to use 2.0.

aaugustin commented 8 years ago

I think I'm seeing the same issue. Here's how to reproduce:

var dogPedigrees = [ "Affenpinscher", "Airedale Terrier", "Akita américain", "Akita Inu", "American Staffordshire Terrier", "Ancien chien d'arrêt danois", "Anglo-Français de petite vénerie", "Ariégeois", "Azawakh", "Barbet", "Barbu tchèque", "Barzoï", "Basenji", "Basset artésien normand", "Basset de Westphalie", "Basset des Alpes", "Basset fauve de Bretagne", "Basset Hound", "Beagle", "Beagle-Harrier", "Bearded Collie", "Beauceron", "Bedlington Terrier", "Berger allemand", "Berger australien", "Berger belge", "Berger blanc suisse", "Berger catalan", "Berger d'Anatolie", "Berger d'Asie centrale", "Berger d'Islande", "Berger de Bergame", "Berger de Brie", "Berger de l'Atlas", "Berger de Maremme et des Abruzzes", "Berger de Picardie", "Berger de Russie", "Berger des Pyrénées", "Berger des Shetland", "Berger du Caucase", "Berger du massif du Karst", "Berger finnois de Laponie", "Berger hollandais", "Berger polonais de plaine", "Berger polonais de Podhale", "Berger portugais", "Berger yougoslave", "Bichon à poil frisé", "Bichon bolonais", "Bichon havanais", "Bichon maltais", "Billy", "Black and tan Coonhound", "Bleu de Gascogne", "Bobtail", "Border Collie", "Border Terrier", "Bouledogue français", "Bouvier bernois", "Bouvier d'Appenzell", "Bouvier d'Australie", "Bouvier de l'Entlebuch", "Bouvier des Ardennes", "Bouvier des Flandres", "Boxer", "Brachet allemand", "Brachet autrichien noir et feu", "Brachet de Styrie à poil dur", "Brachet polonais", "Brachet tyrolien", "Braque allemand à poil court", "Braque allemand à poil dur", "Braque allemand à poil raide", "Braque d'Auvergne", "Braque de Burgos", "Braque de l'Ariège", "Braque de Weimar", "Braque du Bourbonnais", "Braque français", "Braque hongrois à poil court", "Braque hongrois à poil dur", "Braque italien", "Braque Saint-Germain", "Braque slovaque à poil dur", "Briquet Griffon vendéen", "Broholmer", "Buhund norvégien", "Bull Terrier", "Bulldog", "Bullmastiff", "Cairn Terrier", "Cane Corso", "Caniche", "Cão de Castro Laboreiro", "Cão fila de São Miguel", "Carlin", "Cavalier King Charles Spaniel", "Chesapeake Bay Retriever", "Chien chinois à crête", "Chien courant d'Istrie à poil dur", "Chien courant d'Istrie à poil ras", "Chien courant de Bosnie à poil dur", "Chien courant de Halden", "Chien courant de Hygen", "Chien courant de Posavatz", "Chien courant de Transylvanie", "Chien courant espagnol", "Chien courant finnois", "Chien courant hellénique", "Chien courant italien", "Chien courant serbe", "Chien courant slovaque", "Chien courant suisse", "Chien courant yougoslave de montagne", "Chien courant yougoslave tricolore", "Chien d'arrêt allemand à poil long", "Chien d'arrêt portugais", "Chien d'Artois", "Chien d'eau américain", "Chien d'eau espagnol", "Chien d'eau frison", "Chien d'eau irlandais", "Chien d'eau portugais", "Chien d'eau romagnol", "Chien d'élan norvégien", "Chien d'élan suédois", "Chien d'ours de Carélie", "Chien d'Oysel", "Chien de berger de Croatie", "Chien de berger de Majorque", "Chien de Canaan", "Chien de montagne des Pyrénées", "Chien de montagne portugais", "Chien de Saint-Hubert", "Chien de Taïwan", "Chien du Groenland", "Chien du pharaon", "Chien finnois de Laponie", "Chien norvégien de Macareux", "Chien nu du Pérou", "Chien nu mexicain", "Chien rouge de Bavière", "Chien rouge de Hanovre", "Chien suédois de Laponie", "Chien thaïlandais", "Chien-loup de Saarloos", "Chien-loup tchèque", "Chihuahua", "Chow-Chow", "Cirneco de l'Etna", "Clumber Spaniel", "Cocker américain", "Cocker anglais", "Colley à poil court", "Colley à poil long", "Coton de Tuléar", "Curly-Coated Retriever", "Dalmatien", "Dandie Dinmont Terrier", "Dobermann", "Dogo canario", "Dogue allemand", "Dogue argentin", "Dogue de Bordeaux", "Dogue de Majorque", "Dogue du Tibet", "Drever", "Dunker", "Épagneul à perdrix de Drente", "Épagneul bleu de Picardie", "Épagneul breton", "Épagneul de Pont-Audemer", "Épagneul français", "Épagneul japonais", "Épagneul nain continental", "Épagneul picard", "Épagneul tibétain", "Eurasier", "Field Spaniel", "Fila brasileiro", "Flat-Coated Retriever", "Fox-Terrier", "Foxhound américain", "Foxhound anglais", "Français blanc et noir", "Français blanc et orange", "Français tricolore", "Golden Retriever", "Grand Anglo-Français blanc et noir", "Grand Anglo-Français blanc et orange", "Grand Anglo-Français tricolore", "Grand Basset Griffon vendéen", "Grand bouvier suisse", "Grand Gascon saintongeois", "Grand Griffon vendéen", "Grand Münsterländer", "Greyhound", "Griffon à poil dur Korthals", "Griffon belge", "Griffon bruxellois", "Griffon fauve de Bretagne", "Griffon nivernais", "Hamilton stövare", "Harrier", "Hokkaido Ken", "Hovawart", "Husky sibérien", "Irish Glen of Imaal Terrier", "Irish Terrier", "Irish Terrier à poil doux", "Jack Russell Terrier", "Jagdterrier", "Kai", "Kelpie", "Kerry Blue Terrier", "King Charles Spaniel", "Kishu", "Komondor", "Korea Jindo dog", "Kromfohrländer", "Kuvasz", "Labrador Retriever", "Laika de Sibérie occidentale", "Laïka de Sibérie orientale", "Laïka russe européen", "Lakeland Terrier", "Landseer", "Leonberger", "Lévrier afghan", "Lévrier écossais", "Lévrier espagnol", "Lévrier hongrois", "Lévrier irlandais", "Lévrier polonais", "Lhassa Apso", "Malamute de l'Alaska", "Manchester Terrier", "Mastiff", "Mâtin des Pyrénées", "Mâtin espagnol", "Mâtin napolitain", "Mudi", "Norfolk Terrier et Norwich Terrier", "Otterhound", "Parson Russell Terrier", "Pékinois", "Petit Basset Griffon vendéen", "Petit brabançon", "Petit chien courant suisse", "Petit chien hollandais de chasse au gibier d'eau", "Petit chien lion", "Petit Gascon saintongeois", "Petit Lévrier italien", "Petit Münsterländer", "Pinscher", "Pinscher autrichien à poil court", "Podenco canario", "Podenco ibicenco", "Podenco portugais", "Pointer anglais", "Poitevin", "Porcelaine", "Pudelpointer", "Puli", "Pumi", "Rafeiro do Alentejo", "Retriever de la Nouvelle-Écosse", "Rhodesian Ridgeback", "Rottweiler", "Russkiy Toy", "Saint-Bernard", "Saluki", "Samoyède", "Schapendoes néerlandais", "Schiller stövare", "Schipperke", "Schnauzer", "Scottish Terrier", "Sealyham Terrier", "Setter anglais", "Setter Gordon", "Setter irlandais", "Setter irlandais rouge et blanc", "Shar Pei", "Shiba Inu", "Shih Tzu", "Shikoku", "Silky Terrier", "Skye Terrier", "Sloughi", "Slovensky cuvac", "Småland stövare", "Smous des Pays-Bas", "Spinone", "Spitz allemand", "Spitz de Norbotten", "Spitz des Wisigoths", "Spitz finlandais", "Spitz japonais", "Springer anglais", "Stabyhoun", "Staffordshire Bull Terrier", "Sussex Spaniel", "Teckel", "Terre-Neuve", "Terrier australien", "Terrier brésilien", "Terrier de Boston", "Terrier japonais", "Terrier noir russe", "Terrier tchèque", "Terrier tibétain", "Tosa", "Toy Terrier anglais noir et feu", "Volpino", "Welsh Corgi", "Welsh Springer Spaniel", "Welsh Terrier", "West Highland White Terrier", "Whippet", "Yorkshire Terrier"];

dogPedigrees = dogPedigrees.map(function (value) {
    return {'value': value};
});

var Fuse = require('fuse.js');

var options = {
    keys: ['value'],
    maxPatternLength: 16,
    threshold: 0.5
};

var fuse = new Fuse(dogPedigrees, options);

fuse.search("Rottweile");

// [ { value: 'Chien de berger de Croatie' } ]

fuse.search("Rottweiler");

// [ { value: 'Rottweiler' },
//   { value: 'Otterhound' },
//   { value: 'Golden Retriever' },
//   { value: 'Retriever de la Nouvelle-Écosse' },
//   { value: 'Curly-Coated Retriever' },
//   { value: 'Flat-Coated Retriever' },
//   { value: 'Labrador Retriever' },
//   { value: 'Chesapeake Bay Retriever' } ]

I would expect the first search to return at least as many results as the second one.

The more specific the request, the fewer results should be returned.

sepiariver commented 8 years ago

Let me preface by saying this work is amazing, @krisk

I think I'm seeing this reported bug as well. Compare two exact same implementations, with different data sets:

  1. http://map.sepiariver.modxcloud.com/doctor-who-filming-locations/ Search "B" and the results (live updates in sidebar) are null, whereas "BC" returns 4 results.
  2. http://map.sepiariver.modxcloud.com/modx-core-team-locations/ Seems to work consistent with the expectation that a longer search pattern returns more concise results—except when you get to the 4th character. Exhibit search "rya" vs "ryan". Not sure if that has to do with the threshold setting of .4 That is the only non-default config option passed to Fuse in this implementation.

Admittedly my datasets are limited. I didn't even think to report this as an issue, except for the other reports.

kjarrith commented 8 years ago

+1 on this one.

perfectly for strings nested in arrays but only finds on exact match when matching with an object.

For example:

array = [
            {
                name: {
                    givenName: 'Kjartan'
                },
                phoneNumbers: [
                    {
                        type: 'mobile',
                        value: '+354 6967602'
                    }
                ]
            },
            {
                name: {
                    givenName: 'Hörður'
                },
                phoneNumbers: [
                    {
                        type: 'mobile',
                        value: '+354 5812345'
                    }
                ]
            }
        ];

Fuse will match the phoneNumbers.value as expected but will only match name.givenName when it finds an exact match.

Hope this helps.

Options, if anyone is interested:

var options = {
          caseSensitive: false,
          includeScore: false,
          shouldSort: true,
          threshold: 0.6,
          location: 0,
          distance: 100,
          maxPatternLength: 32,
          keys: ["phoneNumbers.value","name.givenName"]
        };