Closed cyanic-selkie closed 1 year ago
Thank you! I'll take a look at fixing these runtime errors tomorrow. Will release a fix fir them asap Cheers
hey @cyanic-selkie - both errors should be fixed now in 10.1.6
. Let me know if you see any others.
Yeah - the es
memory issue looks like a memleak in dumpster-dive - Can you help me reproduce it?
I haven't seen it before.
cheers
I just reran it for id
, zh
, and ja
and it works without any errors.
The es
issue remains. I am using node==20.5.1.
on a server with 64 threads and 128 GB of RAM. The code is here. I tried it with 64 and 8 workers, the error happens in both cases after a few minutes of parsing. Do you need any additional information to help you reproduce it?
On a side note, I'd like to suggest using ^10
or similar for the wtf_wikipedia
dependency version if you're using SemVer, since I had to clone the repository in order to update to the new version.
thanks - that's a real doozy. Wonder why it's only spanish?? I looked at the script, and you haven't declared a few of those variables, which may do it.
i just ran es
on my mac and it ran smoothly:
const opts = {
input: path.join(dir, `/${lang}wiki-latest-pages-articles.xml`),
outputMode: "ndjson",
outputDir: path.join(dir, lang),
parse: function (doc) {
return doc.json()
}
}
dip(opts).then(() => {
console.log('done!')
})
will you try that, on your machine? cheers
good idea using ^10
. Will add that to the next release.
@spencermountain I just fixed the variable declarations and it works perfectly. I'm not used to JS, so thanks!
Hi,
Thank you for the awesome library!
I am currently using
dumpster-dip
to generate a dataset from all Wikipedia languages. It ran fine for all languages exceptja
,zh
,id
.Specifically, for
ja
andzh
I got the following error:For
id
, I got:It is also worth noting that for
es
to complete successfully, I had to set--max-old-space-size
to20000
, which seems excessive, especially since no other language requires changing the default. If I left it at default (or even set to10000
), I got the following error: