pse-group2 / medminersolr

1 stars 1 forks source link

How long does seeding the database take? Does it end successfully for you? #36

Open panmari opened 10 years ago

panmari commented 10 years ago

Hey everyone

When I tried seeding the database first from home (like 2 weeks ago), it just cancelled at some point. Could it be that I was banned by the wikipedia server? Or did you run into the issue too and fixed it by now? Now it seems stuck with 27%. Did you ever try running it with

    Thread.abort_on_exception=true # lets you see exceptions caused in threads

to see if some threads just died along the way?

panmari commented 10 years ago

Yep, as suspected it died with

 SocketError: getaddrinfo: Name or service not known

I presume, wikipedia doesn't really like crawlers ;)

What did you do when seeding? Just start the process multiple times until it finishes successfully?

panmari commented 10 years ago

I made a dirty little patch that makes it run successfully for me. Sorry, too lazy to fork and do a pull request. Anyone feeling like turning this into production level code?

https://gist.github.com/panmari/fbf657b85fccce60708c

awaelchli commented 10 years ago

Hallo, ich nehme mal an du bist Stefan. Danke für die Tipps mit den Threads. Mir ist das auch aufgefallen, dass die Threads einfach wegsterben, aber da meine Mac-Kumpels das nicht hatten nahm ich an es liegt am Thread management von ubuntu.

Wir werden uns deine Lösung ansehen, danke also nochmal.

Ach und übrigens, ich hab das Gemfile etwas aufgeräumt. Sag bescheid wenn ein Gem bei dir nicht geht oder wenn wir welche ersetzten sollen.