wwyaiykycnf / e621dl

The automated e621.net downloader
40 stars 23 forks source link

Crashes with lot of results #35

Open aleksbrgt opened 7 years ago

aleksbrgt commented 7 years ago

The way the results are fetched causes the program to run more and more slowly as it tries to store all the the posts in an array. It will eventually run out of memory.

One way to fix it would be to delegate the task to a database engine. The array method fails, for me, at about 30 to 40k posts.

Edit: Happy New Year ! :tada:

Wulfre commented 7 years ago

The simplest way to do this would probably be to use the pickle module again and pickle the array every 1000 posts or so. To get even simpler, you could pickle every fetch from e621 without appending it to the array at all, but that might end up being slower simply due to the number of files that would need to be unpickled when downloading a large batch.

Happy New Year ! 🎉

EDIT: It might not be as simple to use pickle as I thought since posts are currently handed as nested tuples and the reference for the object is nested within lib.api, which pickle does not like.

EDIT 2: I cannot test this very well to give more input, as I just searched for 100k posts and didn't crash. I have 16G of memory and use my own fork, which is quite similar to wwyaiykyncf's, just rewritten in my own style. I can tell how slow and inefficient it is to use one huge array though.

aleksbrgt commented 7 years ago

Using a DB engine like SQLite would be a great solution, it can be stored in the ram, it is quick and way more easier to handle than pickling.

I tried again on my PC (was using it on a raspberry :rofl:), so yes, it doesn't crash, it is just getting awfully slow !

I am working on an other project, I will make it public once it starts to work ! I also started to make python bindings for the e621 api in a separate project, to make something cleaner and easier to maintain.

Wulfre commented 7 years ago

I'll be watching for it then! I forked this project and made a lot of hacky additions so it doesn't run too smoothly or look very nice in the source code. I might steal your API bindings at the very least.