scambier / obsidian-omnisearch

A search engine that "just works" for Obsidian. Supports OCR and PDF indexing.
GNU General Public License v3.0
1.23k stars 63 forks source link

[BUG] Freeze on first "Obsidian is indexing your vault" file #159

Closed tsbertalan closed 11 months ago

tsbertalan commented 1 year ago

Problem description:

If I enable the plugin, obsidian tells me it must index again, but then freezes on 1/4700 or so. If I manually delete the "omnisearch" line from .obsidian/community-plugins.json, It indexes again, but quickly gets through all files.

My plugins/omnisearch/data.json looks like this:

{
  "respectExcluded": true,
  "ignoreDiacritics": true,
  "indexedFileTypes": [],
  "PDFIndexing": false,
  "imagesIndexing": false,
  "showShortName": false,
  "ribbonIcon": true,
  "showExcerpt": true,
  "renderLineReturnInExcerpts": true,
  "showCreateButton": false,
  "showPreviousQueryResults": true,
  "simpleSearch": false,
  "weightBasename": 2,
  "weightH1": 1.5,
  "weightH2": 1.3,
  "weightH3": 1.1,
  "welcomeMessage": "1.8.0-beta.3"
}

despite that last line, I have omnisearch-1.9.0-beta.2.zip installed (the manifest.json in that zip still says "version": "1.8.1",).

Unfortunately, removing txt and py from indexedFileTypes didn't help as in #134 or #147 , and I can't reproduce this on the obsidian-hub-0.2.1 vault, unlike #7 .

I can open the dev console while the app is loading, and it seems to stay responsive itself (though entering anything at the prompt gets nothing back). It lists this output before the obsidian UI freezes:

loading wikilinks-to-mdlinks plugin...
plugin:obsidian-toggl-integration:25390 Loading obsidian-toggl-integration 0.9.0
plugin:dataview:20685 Dataview: version 0.5.47 (requires obsidian 0.13.11)
plugin:nldates-obsidian:9043 Loading natural language date parser plugin
plugin:obsidian-mind-map:32574 Loading Mind Map plugin
plugin:obsidian-file-link:419 loading plugin file-link
plugin:obsidian-link-archive:94 Loading Link Archive plugin...
plugin:omnisearch:45 Text Extract - Number of available workers: 5
plugin:dataview:13951 Dataview: all 4739 files have been indexed in 6.126s (4739 cached, 0 skipped).
plugin:omnisearch:52 Omnisearch - Loading index from cache: 5304.351318359375 ms
plugin:omnisearch:52 Omnisearch - Total number of files to add/update: 4739
plugin:omnisearch:52 Omnisearch - Total number of files to remove: 0

and the indexing toast shows:

Obsidian is indexing your vault... This should happen only once. Some functionality may not be available until this is complete. (1/4737)

Task manager reports about 20% CPU usage and 5300 MB RAM for Obsidian, almost all of the CPU for a sub-process just called "Obsidian". (There are three others named like that, and one labeled "2022-12-20 - Dropbox - Obsidianv1.1.8", which I guess is the main thread since it opened to 2022-12-20.md.)

I see these 5 entries in the console top> dropdown: image

I think that if I just leave it like this overnight, the app will eventually go all black. Windows remains otherwise responsive, and I can kill Obsidian by right click > Close window on the taskbar.

I'm unable to incrementally take files out of this "vault" since it's my whole 500 G Dropbox, but I can try other troubleshooting steps you might suggest.

Your environment:

[
  "wikilinks-to-mdlinks-obsidian",
  "calendar",
  "obsidian-toggl-integration",
  "obsidian-dialogue-plugin",
  "heatmap-calendar",
  "dataview",
  "periodic-notes",
  "nldates-obsidian",
  "obsidian-mind-map",
  "google-calendar",
  "obsidian-excalidraw-plugin",
  "templater-obsidian",
  "obsidian-file-link",
  "obsidian-link-archive"
]
tsbertalan commented 1 year ago

Another thing I notice--even with omnisearch disabled, Obsidian takes slightly longer to index that first file than all the others. I'm not sure if this is a real thing, or just that the start showing their progress indicator before they're actually ready to use it.

scambier commented 1 year ago

Could you first update Omnisearch to the latest 1.9.1? There has been a few bugfixes and performance improvements since 1.9.0-beta.2

tsbertalan commented 1 year ago

After running some find | wc commands in wsl, I get

scambier commented 1 year ago

Have you update Omnisearch to its latest version?

tsbertalan commented 1 year ago

I thought I was doing so with the zip download from github (despite the incorrect manifest content).

I updated in-app just now to 1.9.1, and, though it froze almost immediately when I toggled it back on in settings.

Then, when I restarted, it wasn’t enabled, and again froze when I turned it on.

I tried uninstalling with the X in the community plugins list, and reinstalling 1.9.1 from store, and it froze before the installation notification had fully slid into view.

However, the gif in the description is still animating:

.

From: Simon Cambier @.> Sent: Saturday, December 24, 2022 4:40 PM To: scambier/obsidian-omnisearch @.> Cc: Tom Bertalan @.>; Author @.> Subject: Re: [scambier/obsidian-omnisearch] [BUG] Freeze on first "Obsidian is indexing your vault" file (Issue #159)

Have you update Omnisearch to its latest version?

— Reply to this email directly, view it on GitHub https://github.com/scambier/obsidian-omnisearch/issues/159#issuecomment-1364586779 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKXWJCYYJCR26VGZIO53TWO5UTNANCNFSM6AAAAAATFW6MFU . You are receiving this because you authored the thread.Message ID: @.***>

tsbertalan commented 1 year ago

Two images that failed to attach:

On restart after that fresh-install it again was not enabled. I'm leaving it so for now.

scambier commented 1 year ago

My best guess right now is that you have a note that's killing the indexing process of Omnisearch (or rather, the underlying minisearch instance).

5300MB of RAM usage is huge. Like, freakingly huge. It's certainly abnormal if you're not indexing your PDFs and images in Omnisearch. Do you have some notes that are "particular" in a way or another? It could be notes with a size of multiple MBs, or with tons of small numbers or letters, anything that could be heavy to index for a search engine.

Could you also try to disable all your other community plugins, restart obsidian, and then try to activate Omnisearch? I doubt it will change anything but it's worth a try.

scambier commented 1 year ago

Could you try this build and report back please? omnisearch 182 fix 1.zip

tsbertalan commented 1 year ago

The taskbar icon starts flashing its orange background like the window is ready, but I still see the (frozen) cache loading throbber.

scambier commented 1 year ago

I still think there's a note (or several) that freeze the indexing process. If you happen to pinpoint which one (and if it's not private), I'll take a look at it.

tsbertalan commented 1 year ago

Could well be. Do you have suggestions on how to locate it? I suppose I could do a branch-by-branch search of my file tree by removing one (sub(sub(sub)))directory at a time, but this would be pretty tedious. Maybe I could add some print statements to the .js ?

tsbertalan commented 1 year ago

That is, I'm pretty confident I can get the console up and displaying with ctrl+shift+I as soon as Obsidian starts.

scambier commented 1 year ago

I have https://github.com/scambier/obsidian-omnisearch/issues/184 planned to add verbose logs to help with issues like yours. I'll try to implement it asap, as it'd be a valuable addition overall

tsbertalan commented 1 year ago

Ok, thanks, that sounds like the easiest approach. I'll watch this space.

scambier commented 1 year ago

I've added a verbose logging option in the latest beta

scambier commented 11 months ago

Cannot reproduce

tsbertalan commented 10 months ago

Ok to keep the issue closed (I also am no longer getting the freeze), but just thought I'd give my own closing report.

I didn't have time to debug at the time, so I just disabled the plugin and used other non-Obsidian tools to satisfy my search needs in the mean time. I recently re-enabled Omnisearch (and installed and enabled the Text Extractor additional toggles), and it seems usable now.

Startup is still somewhat slow, but doesn't interrupt work in the same way that Obsidian's own slow startup does.

Obsidian Developer Console
plugin:wikilinks-to-mdlinks-obsidian:41 loading wikilinks-to-mdlinks plugin...
plugin:heatmap-calendar:340 heyoh null
plugin:dataview:20020 Dataview: version 0.5.64 (requires obsidian 0.13.11)
plugin:obsidian-mind-map:32574 Loading Mind Map plugin
plugin:obsidian-file-link:419 loading plugin file-link
plugin:obsidian-link-archive:94 Loading Link Archive plugin...
plugin:obsidian-toggl-integration:26196 Loading obsidian-toggl-integration 0.11.0
plugin:duplicate-line:6034 tuttut
plugin:periodic-notes:9 [Periodic Notes] initializing cache
plugin:omnisearch:50 Omnisearch - 4868 files total
plugin:omnisearch:50 Omnisearch - Cache is enabled
plugin:obsidian-excalidraw-plugin:92 Initialized Excalidraw Image Cache
plugin:dataview:12759 Dataview: all 4867 files have been indexed in 7.24s (5 cached, 4860 skipped).
plugin:omnisearch:41 Omnisearch - No cache found
plugin:omnisearch:50 Omnisearch - Total number of files to add/update: 4868
plugin:omnisearch:38 Omnisearch - Search cache written
plugin:omnisearch:50 Omnisearch - Indexing total time: 51901.391845703125 ms
moment.min.js:2 Deprecation warning: use moment.updateLocale(localeName, config) to change an existing locale. moment.defineLocale(localeName, config) should only be used for creating a new locale See http://momentjs.com/guides/#/warnings/define-locale/ for more info.

Then on a second startup (after enabling Text Extractor):

loading wikilinks-to-mdlinks plugin...
plugin:heatmap-calendar:340 heyoh null
plugin:dataview:20020 Dataview: version 0.5.64 (requires obsidian 0.13.11)
plugin:obsidian-mind-map:32574 Loading Mind Map plugin
plugin:obsidian-file-link:419 loading plugin file-link
plugin:obsidian-link-archive:94 Loading Link Archive plugin...
plugin:obsidian-toggl-integration:26196 Loading obsidian-toggl-integration 0.11.0
plugin:duplicate-line:6034 tuttut
Text Extractor - Number of available workers: 1 for PDFs, 2 for OCR
plugin:periodic-notes:9 [Periodic Notes] initializing cache
plugin:omnisearch:50 Omnisearch - 165680 files total
plugin:omnisearch:50 Omnisearch - Cache is enabled
plugin:obsidian-excalidraw-plugin:92 Initialized Excalidraw Image Cache
plugin:omnisearch:41 Omnisearch - No cache found
plugin:omnisearch:50 Omnisearch - Total number of files to add/update: 165680
plugin:dataview:12759 Dataview: all 4867 files have been indexed in 25.52s (4867 cached, 0 skipped).

Re: Freakingly huge memory usage

After maybe five minutes of running, Obsidian's memory usage is 4966 MB in Task Manager (but not growing from there) ...

image

... but this isn't reported in the main-thread breakdown in the Obsidian console.

image

This is on my Thinkpad "Gungnir" with 16G of installed RAM, and Windows reports 342,162 undifferentiated "files" in the folder used as vault root. WSL says these include 21620 pdf's and 5013 md's. (find . -iname "*.pdf" -type f | wc)

(There are 28403 PDFs and 5558 MDs in the full Vault, but some are excluded on the laptop via Dropbox.)

scambier commented 10 months ago

The super high memory usage is expected with that many files. Text Extractor consumes a lot when working with PDFs, and omnisearch itself needs a large cache. RAM consumption peaks during load but should settle down once Omnisearch is done doing active work.

paulpall commented 5 months ago

I was running into the same issue earlier today and figured out that the .csv log files that I had in a notebook were the root cause. Hope this helps someone!