nextapps-de / flexsearch

Next-Generation full text search library for Browser and Node.js
Apache License 2.0
12.32k stars 489 forks source link

Importing exported indexes doesn't populate store (#249) #384

Open zanzlender opened 1 year ago

zanzlender commented 1 year ago

I've been having some trouble implementing exporting and then importing those same indexes...

I found this issue #249 . Although it says it should have been fixed, I still have the same problem.

I followed this linked Stackoverflow issue and exported my indexes like so;:

// Write to FlexSearch index file
const flexIndex = new Document({
  tokenize: "forward",
  document: {
    id: "id",
    index: ["id", "url", "transcript", "timestamp"],
    store: true,
  },
  context: {
    resolution: 5,
    depth: 3,
  },
  cache: true,
});

/*
type Transcript = {
  id: string;
  url: string;
  transcript: Array<{
    timestamp: string;
    transcript: string;
  }>;
};
*/ 
transcriptsJson.forEach((_video) => {
  _video.transcript.forEach((_transcript, _index) => {
    flexIndex.add({
      id: `${_video.id}-${_index}`,
      url: _video.url,
      transcript: _transcript.transcript,
      timestamp: _transcript.timestamp,
    });
  });
});

const searchIndexPath2 = path.join(cwd(), "/src/content/flex-search/");

const res = await flexIndex.export(function (key, data) {
  fs.writeFileSync(
    `${searchIndexPath2}${key}.json`,
    data !== undefined ? (data as string) : ""
  );
});

And later I try to import them like so:

const keys = fs
  .readdirSync(searchIndexPath, { withFileTypes: true })
  .filter((item) => !item.isDirectory() && item.name.includes(".json"))
  .map((item) => item.name.slice(0, -5));

for (let i = 0, key; i < keys.length; i += 1) {
  key = keys[i];
  const data = fs.readFileSync(
    `${searchIndexPath}${key ?? ""}.json`,
    "utf8"
  );

  await flexIndex.import(key as string, data ?? null);
}

And finally I can search like so

const res = flexIndex.search("03", {
  index: ["transcript", "timestamp"],
  enrich: true,
});

const xy = res.find((x) => res.field === "timestamp")?.result;
console.log(xy);

Everything works fine up to this point and I get the results I wanted, but the doc object is undefined...

image

However, when I try to do the same, but only create the indexes like in the first code example, then everything works as expected:

image

Does this mean #254 is not fixed yet or am I doing something wrong? Do I need to handle the data object while importing in a special way instead of just importing the whole data?

zanzlender commented 1 year ago

I've also noticed that for some reason one of the saved files is timestamp.store.json I don't know how it's decided what the name is but it seems kind of unintuitive since most of my data is actually in the transcript property, but is then not saved in a transcript.json or transcript.store.json.

image

But I don't know if this plays any role in my problem.

grimsteel commented 1 year ago

I'm also experiencing this.

JSFiddle Example: https://jsfiddle.net/tnx5qLzd/

bcspragu commented 1 year ago

Chiming in with the same issue. My setup looks something like:

// Exporting
import flexsearch from 'flexsearch'

const docIndex = new flexsearch.Document({
  document: {
    id: 'id',
    index: ['title', 'description', 'source', 'tags', 'body'],
    store: ['title', 'description', 'tags'],
  },
});

  documents.forEach((doc) => {
    docIndex.add(doc.slug, {
      title: doc.title,
      description: doc.description,
      source: doc.source,
      tags: doc.tags,
      body: doc.body,
    })
  })

  docIndex.export((key, data) => {
    // Line-delimited JSON-objects, plays nicely with the async-ish nature of export
    stdout.write(JSON.stringify({key, data}) + '\n')
  })
})
// Importing
import { Document } from 'flexsearch'

const docIndex = new Document({
  document: {
    id: 'id',
    index: ['title', 'description', 'source', 'tags', 'body'],
    store: ['title', 'description', 'tags'],
  },
}) as Document<Post, string[]>;

await readByLines('/flexsearch.json', (line: string) => {
  const imp = JSON.parse(line)
  docIndex.import(imp.key, imp.data);
})

const searchResults = docIndex.search({
  query: 'the query',
  enrich: true,
});

// searchResults[].result[].doc is undefined
maxhoffmann commented 8 months ago

I’m running into the same bug. This remains a problem with version 0.7.34

kgwosh commented 4 months ago

I'm running into the same bug. like: const keys = fs .readdirSync(searchIndexPath, { withFileTypes: true }) .filter(item => !item.isDirectory()) .map(item => item.name)

for(let i = 0, key; i < keys.length; i++){

key = keys[i];
// console.log(key.slice(0, -5));
const data = fs.readFileSync(`${searchIndexPath}${key}`, 'utf8')
console.log(key.slice(0, -5) , data);
index.import(key.slice(0, -5) , data);

}

but i find it fix when running like this: const keys = fs .readdirSync(searchIndexPath, { withFileTypes: true }) .filter(item => !item.isDirectory()) .map(item => item.name)

for(let i = 0, key; i < keys.length; i++){

key = keys[i];
// console.log(key.slice(0, -5));
const data = fs.readFileSync(`${searchIndexPath}${key}`, 'utf8')
const parsedData = JSON.parse(data);
console.log(key.slice(0, -5) , parsedData );
index.import(key.slice(0, -5) , parsedData );

}

adding JSON.parse(data); is OK ,have a try