themoeway / kaikki-to-yomitan

Yomitan-compatible dictionaries from wikitionary data
https://github.com/themoeway/kaikki-to-yomitan/releases
39 stars 8 forks source link

Script freezes at "Processed 112431 lines" with Finnish-to-English #23

Closed DefiantCatgirl closed 2 months ago

DefiantCatgirl commented 3 months ago

Trying to convert the Finnish-to-English wiktionary dump, script freezes at "Processed 112431 lines" (changed the log step to 1) and just keeps running forever, eating up CPU but seemingly doing nothing.

Node v20, Windows 10 x64 (running via mingw64), same result on Ubuntu 20.04

(thank you very much for making this tool, this is exactly what I wanted for years)

$ ./auto.sh Finnish English k
[S] source_all: false
[T] target_all: false
[d] redownload: false
[F] force: false
[t] force_tidy: false
[y] force_ymt: false
[k] keep_files: true

up to date, audited 326 packages in 1s

35 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities
------------------------------- Finnish -> English -------------------------------
Kaikki dict already exists. Skipping download.
3-tidy-up.js: Processed 112431 lines...

Last processed word appears to have been palvelusaika but deleting it or any words around it causes the script to freeze anyway within a few lines from there.

StefanVukovic99 commented 3 months ago

Good news is this reproduces. Bad news is Finnish has a lot of forms and it's hitting performance limits. I've changed an object to a map to go to line 187513, but now running into max size of Map...

StefanVukovic99 commented 3 months ago

With the changes in #24 it seems to stall on processing forms in the next script. Though maybe it would have gone through eventually, or maybe it will work with even more memory - I ran it with MAX_MEMORY_MB=16384 in .env. I'll investigate further when I can.

DefiantCatgirl commented 3 months ago

Using that branch and MAX_MEMORY_MB=24576 the script actually succeeded within ~15 minutes! Thank you so much!

...though now the problem is that when trying to import it to Yomitan, the extension itself crashes at ~60% with "out of memory" in Edge & other chromiums, and gets stuck at 30% in Firefox. But I guess this is a known issue with large dictionaries already https://github.com/themoeway/yomitan/issues/381