lucaong / minisearch

Tiny and powerful JavaScript full-text search engine for browser and Node
https://lucaong.github.io/minisearch/
MIT License
4.49k stars 132 forks source link

Error with loadJSON method #107

Closed cmcknight closed 2 years ago

cmcknight commented 2 years ago

OS: MacOS 11.6 Node: 15.11.0 Minisearch 3.1.0

====================================

Building index with the following code:

const miniSearch = require('minisearch')
const fs = require('fs');
const path = require("path");

const getProductFiles = function(dirPath, arrayOfFiles) {
  files = fs.readdirSync(dirPath);

  arrayOfFiles = arrayOfFiles || [];

  files.forEach(function(file) {
    let fn = path.join(dirPath, file);
    (fs.statSync(fn).isDirectory()) ?
      arrayOfFiles = getProductFiles(fn, arrayOfFiles) :
      arrayOfFiles.push(path.join(dirPath, "/", file));
  });

  return arrayOfFiles;
}

let arrayOfFiles;
const inputFiles = 
  getProductFiles(path.join('src', '_data'), arrayOfFiles)
      .filter(file => path.extname(file) === '.json');

let idCounter = 0

let ms = new miniSearch({
  fields: [ 'sku', 'category', 'type', 'subtype', 'name', 'description', 'cost',
            'mass', 'size', 'techLevel', 'qrebs', 'tags' ],
  storeFields: ['sku', 'name', 'description', 'cost']
});

inputFiles.forEach(file => {  

  // get the products from the file
  let products = JSON.parse(fs.readFileSync(`${file}`));

  // build search index object and add to search index
  products.forEach(product => {
    product.id = idCounter++;
    ms.add(product);
  })
})

fs.writeFileSync('src/_data/searchindex.idx', JSON.stringify(ms))

let jsonIdx = fs.readFileSync('src/_data/searchindex.idx', 'utf8');

let ms2 = new miniSearch.loadJSON(jsonIdx, {
  fields: [ 'sku', 'category', 'type', 'subtype', 'name', 'description', 'cost',
            'mass', 'size', 'techLevel', 'qrebs', 'tags' ],
  storeFields: ['sku', 'name', 'description', 'cost']
});

// console.log(`ms is ${(Array.isArray(ms)) ? "" : "not"})`)
let searchTerm = 'portal'
let options = (searchTerm.includes(' and ')) ? { combineWith: 'AND'} : {}
let res = ms2.search(searchTerm, options);
res.forEach(result => console.log(result));

The code above appears to work correctly and returns search results (use attached file searchindex.idx)

In the code below, I may be doing something wrong with the fetch, but I'm not sure what it is.

  fetch(searchIndexLocation)
    .then((res) => res.json())
    .then((data) => {
      console.log(data);
      const jsonDocs = data;

// line 272 is the next line
      let miniSearch = new MiniSearch.loadJSON(jsonDocs, {
        fields: [ 'sku', 'category', 'type', 'subtype', 'name', 'description', 'cost',
                  'mass', 'size', 'techLevel', 'qrebs', 'tags' ],
        storeFields: ['sku', 'name', 'description', 'cost']
      });

    })
    .catch((err) => console.log(err));

because I am consistently getting the following error:

SyntaxError: Unexpected token o in JSON at position 1
    at JSON.parse (<anonymous>)
    at new t.loadJSON (index.js:1126)
    at scripts.js:272

I'm still a bit new to the fetch API but it looks like something is occurring with the index file before it is getting to the loadJSON call.

Any clues appreciated. : searchindex.idx.zip -/

cmcknight commented 2 years ago

@lucaong

I ran the index (in searchindex.idx) through a JSON validator and it indicated that the object was valid, so I traced it with a debugger and indeed, the call to JSON.parse in the library is dying. Is there a limitation on how big a pre-complied index can be?

I rewrote the index building code to add all of the documents through addAll() thinking that might be an issue. No joy there, the loadJSON method still failed. Oddly, when I just save the array of documents and then loaded them with addAll() from the website, the index was built properly.

I feel like there is something subtle that I'm just missing... :-/

cmcknight commented 2 years ago

Just a comment, it feels like there is some subtle difference between the CDN version and the npm version. The loadJSON() method works fine from within a node app, but fails when used on a web page.

lucaong commented 2 years ago

Hi @cmcknight , thank you for your report!

I do not know what is causing your issue. If the CDN version causes the error but Node doesn’t, is it possible for you to test with a self-hosted version, to verify if the CDN somehow breaks it? You can create a minified build by cloning the repo and running yarn build-minified (the minified script will be created in the dist folder).

As a side note, I find that on most use-cases it makes more sense to rebuild the index every time rather than going through the effort of serializing a pre-built index and recreating it every time something changes. Without knowing your use case though I don’t know if that would be a good solution for you.

In many cases the cost of transferring and parsing a big JSON index is comparable to reindexing the full collection (especially in cases where the collection of documents has to be transferred anyway to the client).

cmcknight commented 2 years ago

Hi @lucaong,

I cloned the repo and tried to build it, but the build gets hung when I tried yarn build-minified so I tried yarn build to see if there was any difference but it still gets hung:

    ~/git-projects  git clone https://github.com/lucaong/minisearch.git                                       ✔
Cloning into 'minisearch'...
remote: Enumerating objects: 3156, done.
remote: Counting objects: 100% (345/345), done.
remote: Compressing objects: 100% (222/222), done.
remote: Total 3156 (delta 195), reused 228 (delta 110), pack-reused 2811
Receiving objects: 100% (3156/3156), 4.16 MiB | 5.86 MiB/s, done.
Resolving deltas: 100% (2108/2108), done.
    ~/git-projects  cd minisearch                                                                      ✔  4s 
    ~/git-projects/minisearch    master  yarn install                                                      ✔
yarn install v1.22.15
warning ../package.json: No license field
[1/4] 🔍  Resolving packages...
[2/4] 🚚  Fetching packages...
[3/4] 🔗  Linking dependencies...
warning " > typedoc@0.19.2" has incorrect peer dependency "typescript@3.9.x || 4.0.x".
[4/4] 🔨  Building fresh packages...
✨  Done in 37.04s.
    ~/git-projects/minisearch    master  yarn test                                                 ✔  37s 
yarn run v1.22.15
warning ../package.json: No license field
$ jest
 PASS  src/MiniSearch.test.js
 PASS  src/SearchableMap/SearchableMap.test.js

Test Suites: 2 passed, 2 total
Tests:       104 passed, 104 total
Snapshots:   0 total
Time:        2.146 s, estimated 5 s
Ran all test suites.
✨  Done in 3.18s.
    ~/git-projects/minisearch    master  yarn build                                                 ✔  3s 
yarn run v1.22.15
warning ../package.json: No license field
$ yarn clean-build && NODE_ENV=production rollup -c
warning ../package.json: No license field
$ rm -rf dist

src/index.ts → dist/es...
created dist/es in 1.6s

src/index.ts → dist/es5m...
created dist/es5m in 949ms

src/index.ts → dist/umd...
created dist/umd in 874ms

src/index.ts → dist/types...
created dist/types in 722ms

src/SearchableMap/SearchableMap.ts → dist/es...
created dist/es in 732ms

src/SearchableMap/SearchableMap.ts → dist/es5m...
created dist/es5m in 656ms

src/SearchableMap/SearchableMap.ts → dist/umd...
created dist/umd in 734ms

Is there something I've missed?

My use case is that I have a static site demonstrating one approach to a catalog. The content is updated weekly, and a service worker caches the index to improve performance by avoiding the need to retrieve the index from the site every time. The index is stored in the application cache for the site. The relative file sizes are not significantly different for index vs. the array of JSON objects, but I'd hoped to eliminate the parsing step by pre-generating the index.

lucaong commented 2 years ago

@cmcknight there was a problem in a development dependency causing the build to hang. It should be fixed on the latest master now.

cmcknight commented 2 years ago

@lucaong Cool, just did a fresh clone from Master and it built both the regular and minified builds. I'll test this an report if I'm still seeing the same issues.

cmcknight commented 2 years ago

@lucaong

Still getting the same error even when self-hosting. The JSON.parser() somehow thinks it's a bad file. I'm assuming that the umd folder is the one I need to be linking to locally, correct?

cmcknight commented 2 years ago
params: s=torch
scripts.js:253 searchParams: torch
scripts.js:261 (performSiteSearch) Params: torch
scripts.js:270 (1244) [{…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, {…}, …][0 … 99]0: {sku: '010-000-00001', type: 'breathing apparatus', subtype: '', name: 'Air Tank (TL 5)', cost: 500, …}1: {sku: '010-000-00002', type: 'breathing apparatus', subtype: '', name: 'Air Tank (TL 9)', cost: 500, …}2: {sku: '010-000-00003', type: 'storage', subtype: '', name: 'Air Tank (TL B)', cost: 500, …}3: {sku: '010-000-00004', type: 'breathing apparatus', subtype: '', name: 'Air Tank IN / 374', cost: 1600, …}4: {sku: '010-000-00005', type: 'breathing apparatus', subtype: '', name: 'Atmospheric Sponge', cost: 500000, …}5: {sku: '010-000-00006', type: 'breathing apparatus', subtype: '', name: 'Breather (TL 7)', cost: 200, …}6: {sku: '010-000-00007', type: 'breathing apparatus', subtype: '', name: 'Breather (TL 8)', cost: 400, …}7: {sku: '010-000-00008', type: 'breathing apparatus', subtype: '', name: 'Breather (TL A)', cost: 600, …}8: {sku: '010-000-00009', type: 'breathing apparatus', subtype: '', name: 'Combination Mask (TL 5)', cost: 150, …}9: {sku: '010-000-00010', type: 'breathing apparatus', subtype: '', name: 'Combination Mask (TL 8)', cost: 300, …}10: {sku: '010-000-00011', type: 'breathing apparatus', subtype: '', name: 'Combination Mask (TL A)', cost: 500, …}11: {sku: '010-000-00012', type: 'breathing apparatus', subtype: '', name: 'Filter Mask (TL 3)', cost: 10, …}12: {sku: '010-000-00013', type: 'breathing apparatus', subtype: '', name: 'Filter Mask (TL 8)', cost: 40, …}13: {sku: '010-000-00014', type: 'breathing apparatus', subtype: '', name: 'Filter (TL A)', cost: 80, …}14: {sku: '010-000-00015', type: 'breathing apparatus', subtype: '', name: 'Gill', cost: 4000, …}15: {sku: '010-000-00016', type: 'breathing apparatus', subtype: '', name: 'Rebreather', cost: 200, …}16: {sku: '010-000-00017', type: 'breathing apparatus', subtype: '', name: 'Respirator (TL 5)', cost: 100, …}17: {sku: '010-000-00018', type: 'breathing apparatus', subtype: '', name: 'Respirator (TL 8)', cost: 100, …}18: {sku: '010-000-00019', type: 'breathing apparatus', subtype: '', name: 'Respirator (TL A)', cost: 100, …}19: {sku: '010-000-00020', type: 'breathing apparatus', subtype: '', name: 'Surface Water Tank', cost: 1000, …}20: {sku: '020-000-00001', type: 'comms', subtype: '', name: 'Communicator', mfr: '', …}21: {sku: '020-000-00002', type: 'comms', subtype: '', name: 'Communicator (Modified)', mfr: '', …}22: {sku: '020-000-00003', type: 'comms', subtype: '', name: 'Communicator (Advanced)', mfr: '', …}23: {sku: '020-000-00004', type: 'comms', subtype: '', name: 'Communications Installation', mfr: '', …}24: {sku: '020-000-00005', type: 'comms', subtype: '', name: 'Communicator (Long Range)', mfr: '', …}25: {sku: '020-000-00006', type: 'comms', subtype: '', name: 'Communicator (Luxury)', mfr: '', …}26: {sku: '020-000-00007', type: 'comms', subtype: '', name: 'Communicator (Ruggedized)', mfr: '', …}27: {sku: '020-000-00008', type: 'comms', subtype: '', name: 'Communicator (Vehicle)', mfr: '', …}28: {sku: '020-000-00009', type: 'comms', subtype: '', name: 'Radio', mfr: '', …}29: {sku: '020-000-00010', type: 'comms', subtype: '', name: 'Radio (Experimental)', mfr: '', …}30: {sku: '030-000-00001', type: 'computers', subtype: '', name: 'Databank', mfr: '', …}31: {sku: '030-000-00002', type: 'computers', subtype: '', name: 'Portable Computer', mfr: '', …}32: {sku: '030-000-00003', type: 'computers', subtype: '', name: 'Research Console', mfr: '', …}33: {sku: '030-001-00001', type: 'computers', subtype: 'computer accessory', name: 'Data Display (TL A)', mfr: '', …}34: {sku: '030-001-00002', type: 'computers', subtype: 'computer accessory', name: 'Data Display (TL D)', mfr: '', …}35: {sku: '030-001-00003', type: 'computers', subtype: 'computer accessory', name: 'Data Recorder/Relay', mfr: '', …}36: {sku: '030-001-00004', type: 'computers', subtype: 'computer accessory', name: 'Datalink', mfr: '', …}37: {sku: '030-001-00005', type: 'computers', subtype: 'computer accessory', name: 'Imperial ID', mfr: '', …}38: {sku: '030-001-00006', type: 'computers', subtype: 'computer accessory', name: 'Jump Tape', mfr: '', …}39: {sku: '030-001-00007', type: 'computers', subtype: 'computer accessory', name: 'Map Box', mfr: '', …}40: {sku: '030-001-00008', type: 'computers', subtype: 'computer accessory', name: 'Map Box Insert', mfr: '', …}41: {sku: '030-001-00009', type: 'computers', subtype: 'computer accessory', name: 'Map Box Insert', mfr: '', …}42: {sku: '030-001-00010', type: 'computers', subtype: 'computer accessory', name: 'Memclip (1 Language)', mfr: '', …}43: {sku: '030-001-00011', type: 'computers', subtype: 'computer accessory', name: 'Starchart', mfr: '', …}44: {sku: '030-001-00012', type: 'computers', subtype: 'computer accessory', name: 'Survey Data', mfr: '', …}45: {sku: '030-001-00013', type: 'computers', subtype: 'computer accessory', name: 'Wafer Jack', mfr: '', …}46: {sku: '030-001-00014', type: 'computers', subtype: 'computer accessory', name: 'Xmail Wafer', mfr: '', …}47: {sku: '040-000-00001', type: 'construction', subtype: '', name: 'Cutting Torch', mfr: '', …}48: {sku: '040-000-00002', type: 'construction', subtype: 'artifact', name: 'Eternity Circuit Module', mfr: '', …}49: {sku: '040-000-00003', type: 'construction', subtype: '', name: 'Hatch', mfr: '', …}50: {sku: '040-000-00004', type: 'construction', subtype: '', name: 'Hoist', mfr: '', …}51: {sku: '040-000-00005', type: 'construction', subtype: '', name: 'Iris Valve', mfr: '', …}52: {sku: '040-000-00006', type: 'construction', subtype: '', name: 'Lock (Collapsible Air Lock)', mfr: '', …}53: {sku: '040-000-00007', type: 'construction', subtype: 'artifact', name: 'Matter Transporter', mfr: '', …}54: {sku: '040-000-00008', type: 'construction', subtype: 'artifact', name: 'Molecular Disassmbler', mfr: '', …}55: {sku: '040-000-00009', type: 'construction', subtype: 'artifact', name: 'Planetary Core Tap', mfr: '', …}56: {sku: '040-000-00010', type: 'construction', subtype: 'artifact', name: 'Portal, Cargo', mfr: '', …}57: {sku: '040-000-00011', type: 'construction', subtype: 'artifact', name: 'Portal, Personal', mfr: '', …}58: {sku: '040-000-00012', type: 'construction', subtype: 'artifact', name: 'Portal, Ship', mfr: '', …}59: {sku: '040-000-00013', type: 'construction', subtype: 'artifact', name: 'Portal Generator', mfr: '', …}60: {sku: '040-000-00014', type: 'construction', subtype: 'artifact', name: 'Star Energy Tap', mfr: '', …}61: {sku: '040-000-00015', type: 'construction', subtype: '', name: 'Tape (Slap Tape)', mfr: '', …}62: {sku: '040-000-00016', type: 'construction', subtype: 'artifact', name: 'Teleport Platforms (Early)', mfr: '', …}63: {sku: '040-000-00017', type: 'construction', subtype: 'artifact', name: 'Teleport Platforms', mfr: '', …}64: {sku: '040-000-00018', type: 'construction', subtype: 'artifact', name: 'Teleport Platforms (Advanced)', mfr: '', …}65: {sku: '040-000-00019', type: 'construction', subtype: '', name: 'Welding Torch (Gas-Powered)', mfr: '', …}66: {sku: '040-000-00020', type: 'construction', subtype: '', name: 'Welding Torch (Laser)', mfr: '', …}67: {sku: '040-000-00021', type: 'construction', subtype: '', name: 'Welding Torch (Plasma)', mfr: '', …}68: {sku: '040-001-00001', type: 'construction', subtype: 'supplies', name: 'Gas Torch Tank Refill', mfr: '', …}69: {sku: '040-001-00002', type: 'construction', subtype: 'construction materials', name: 'Leather', mfr: '', …}70: {sku: '040-001-00003', type: 'construction', subtype: 'construction materials', name: 'Bones', mfr: '', …}71: {sku: '040-001-00004', type: 'construction', subtype: 'construction materials', name: 'Wooden Beams', mfr: '', …}72: {sku: '040-001-00005', type: 'construction', subtype: 'construction materials', name: 'Wooden Planks', mfr: '', …}73: {sku: '040-001-00006', type: 'construction', subtype: 'construction materials', name: 'Wooden Sheets', mfr: '', …}74: {sku: '040-001-00007', type: 'construction', subtype: 'construction materials', name: 'Fiberglass', mfr: '', …}75: {sku: '040-001-00008', type: 'construction', subtype: 'construction materials', name: 'Blocks (Stone)', mfr: '', …}76: {sku: '040-001-00009', type: 'construction', subtype: 'construction materials', name: 'Blocks (Ceramic)', mfr: '', …}77: {sku: '040-001-00010', type: 'construction', subtype: 'construction materials', name: 'Aluminum (Sheets)', mfr: '', …}78: {sku: '040-001-00011', type: 'construction', subtype: 'construction materials', name: 'Copper (Sheets)', mfr: '', …}79: {sku: '040-001-00012', type: 'construction', subtype: 'construction materials', name: 'Bronze (Sheets)', mfr: '', …}80: {sku: '040-001-00013', type: 'construction', subtype: 'construction materials', name: 'Iron (Sheets)', mfr: '', …}81: {sku: '040-001-00014', type: 'construction', subtype: 'construction materials', name: 'Structural Polymer (self-healing)', mfr: '', …}82: {sku: '040-001-00015', type: 'construction', subtype: 'construction materials', name: 'Structural metals', mfr: '', …}83: {sku: '040-001-00016', type: 'construction', subtype: 'construction materials', name: 'Steel (Sheets)', mfr: '', …}84: {sku: '040-001-00017', type: 'construction', subtype: 'construction materials', name: 'Titanium (Sheets)', mfr: '', …}85: {sku: '190-000-00001', type: 'containers', subtype: '', name: 'Attaché Case', mfr: '', …}86: {sku: '190-000-00002', type: 'containers', subtype: '', name: 'Backpack', mfr: '', …}87: {sku: '190-000-00003', type: 'containers', subtype: '', name: 'Environmental Tank', mfr: '', …}88: {sku: '190-000-00004', type: 'containers', subtype: '', name: 'Gravitic Tank', mfr: '', …}89: {sku: '190-000-00005', type: 'containers', subtype: '', name: 'Safe', mfr: '', …}90: {sku: '190-000-00006', type: 'containers', subtype: '', name: 'Storngbox', mfr: '', …}91: {sku: '190-000-00007', type: 'containers', subtype: '', name: 'Toolbag', mfr: '', …}92: {sku: '190-000-00008', type: 'containers', subtype: '', name: 'Toobox', mfr: '', …}93: {sku: '190-000-00009', type: 'containers', subtype: '', name: 'Toolchest', mfr: '', …}94: {sku: '190-000-00010', type: 'containers', subtype: '', name: 'Vault', mfr: '', …}95: {sku: '190-000-00011', type: 'containers', subtype: '', name: 'Bottle', mfr: '', …}96: {sku: '190-000-00012', type: 'containers', subtype: '', name: 'Canteen', mfr: '', …}97: {sku: '190-000-00013', type: 'containers', subtype: '', name: 'Flask', mfr: '', …}98: {sku: '190-000-00014', type: 'containers', subtype: '', name: 'Storage Tank', mfr: '', …}99: {sku: '190-000-00015', type: 'containers', subtype: '', name: 'Water Purifier', mfr: '', …}[100 … 199][200 … 299][300 … 399][400 … 499][500 … 599][600 … 699][700 … 799][800 … 899][900 … 999][1000 … 1099][1100 … 1199][1200 … 1243]length: 1244
scripts.js:346 SyntaxError: Unexpected token o in JSON at position 1
    at JSON.parse (<anonymous>)
    at new n.loadJSON (MiniSearch.ts:833)
    at scripts.js:290

I added the statement:

let t = JSON.parse(data)

in my code and the error happens there so there's something fishy about how minisearch is getting serialized with JSON.stringify.

lucaong commented 2 years ago

thanks @cmcknight , I'll investigate further!

cmcknight commented 2 years ago

@lucaong

I'm not exactly positive yet, but does the MiniSearch object being stringified have functions? If it does, that might be the issue.

JustAnotherArchivist commented 2 years ago

FWIW, JSON.parse in Firefox on Debian Sid parses this index without any issues it seems.

JustAnotherArchivist commented 2 years ago

I think I found the issue: loadJSON expects a JSON string, but you're already converting the fetch response to a JS object with res.json(). So I think you just need to use res.text() there instead. (You could also use MiniSearch.loadJS with the JS object, which is what loadJSON calls internally, but that's undocumented API and therefore not a good idea.)

I didn't try it with the Fetch API, but loading works fine for me with this quick and dirty XHR example:

<html>
<head>
<script src="https://cdn.jsdelivr.net/npm/minisearch@3.1.0/dist/umd/index.min.js"></script>
<script>
req = new XMLHttpRequest();
req.open('GET', '/searchindex.idx');
req.onload = function() {
    m = MiniSearch.loadJSON(req.responseText, {
        fields: [ 'sku', 'category', 'type', 'subtype', 'name', 'description', 'cost', 'mass', 'size', 'techLevel', 'qrebs', 'tags' ],
        storeFields: ['sku', 'name', 'description', 'cost']
    });
    console.log(m);
};
req.send()
</script>
</head>
<body></body>
</html>
cmcknight commented 2 years ago

Ah, that makes sense.

On Nov 10, 2021, at 12:59 PM, JustAnotherArchivist @.***> wrote:

I think I found the issue: loadJSON expects a JSON string, but you're already converting the fetch response to a JS object with res.json(). So I think you just need to use res.text() there instead. (You could also use MiniSearch.loadJS with the JS object, which is what loadJSON calls internally, but that's undocumented API and therefore not a good idea.)

I didn't try it with the Fetch API, but loading works fine for me with this quick and dirty XHR example:

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lucaong/minisearch/issues/107#issuecomment-965741263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA62YKBAD75S6C2K2HAL6LULLMMPANCNFSM5FKNZUWQ.

lucaong commented 2 years ago

You are absolutely right @JustAnotherArchivist .

I am closing the issue as it should be solved, but feel free to comment further if necessary.