lurkbbs / e621dl

The automated download script for e621.net. Originally by @wwyaiykycnf.
11 stars 3 forks source link

no cache and cross file download #7

Closed huskelord closed 5 years ago

huskelord commented 5 years ago

Hey long time no see. so I came back to your program after a while since i wanted to redownload e621 and noticed some diffrences/ maybe problems that i wanted to check.

and finally i would like to request a status update on not just what file group it is working on or how many posts it has downloaded but how many posts it has skiped since they are alrady downloaded or not requested. Edit ( i dont know if the posts so far on the status page is this or how many have been downloaded if it isnt that would be nice, if it is my bad)

All in all still a great progam and hope you keep up the good work

lurkbbs commented 5 years ago

Hello. It feels good to see someone actually uses my fork besides myself 😄

Cache folder should be there, but it's switched off by default. Same for database and same for hardlinks and md5s. Just uncomment and change settings from

;[Settings]
;include_md5 = false
;make_hardlinks = false
;make_cache = false
;db = false

to

[Settings]
;include_md5 = false
make_hardlinks = true 
make_cache = true
db = true

For make_hardlinks you should either enable Develpment mode in Windows 10 or just run as admin in every Windows since 7, including Win 10.

I'm not sure about images not downloading in many folders. By the way, you can ask me about any specific tag and folder structure via lurkbbs AT gmail DOT com. Throwaway mail or not. Same for current config.ini. It's easier to test this way, obviously. It's a shame there is no private messages on Github.

the progam will slow down and completly stop over time which makes downloading overnight impossible

That's extremely strange. It definitely works for me. It doesn't mean I don't believe you, just that I'm not sure how to catch this bug.

i have to close an reopen it to get it to start up again while having to wait for the progam to catch up to where it left off

That's also strange. I took great effort to save state of queue and continue where I left off. There should be download_queue.pickle file in a folder with e621dl.py. Tell me if it isn't there.

status update on ... how many posts it has skiped since they are alrady downloaded or not requested

I'll try to add this, most definitely before 25 of November, but maybe even as soon as this Sunday evening (UTC+3).

Now, about bugs, some questions:

  1. Do you use a source version from the repo, or an executable from the download page?
  2. Can you give me you config.ini for overnight downloads? Via my email, of course. Or at least config.ini that stops overnight, so I can test it.
  3. How do you close e621py when it hangs? Is there download_queue.pickle in a folder with the app?
  4. What OS do you use? What version?
huskelord commented 5 years ago

i use the executabel

Serch Data is basicially a backup for my config file Serch Data.txt

download pickle works it was a problem with the old version i was using i guess

i dont have code or anythign special that makes it run/stop over night i just let it run but at some point it stop or just closes

i click the x button and reopen it via the explorer

windows 10 and usually run as admistrator

Also i think in the read_Me it says cache and some other stuff are turned on by defalt

Can you describe what a md5 and db are? also turning hardlink or cache doesnt fix the multiple files downlaod

Also the old program that i was running that had the multiple file download has 5.2.1 and here is my old and here is the Serch data (config file) for that old program Serch Data.txt

lurkbbs commented 5 years ago

i dont have code or anythign special that makes it run/stop over night i just let it run but at some point it stop or just closes

Maybe you router or ISP forces reconnection somewhere at night? I know mine does that. Not sure how to recover connection in this case, but I'll try to think of something.

windows 10 and usually run as admistrator

Right click -> Run as Admin or regular?

Also i think in the read_Me it says cache and some other stuff are turned on by defalt

Really? Let's see

You can cache all files downloaded before to cache folder. This is default behavior, actually. You can store all posts info from API to local database. This is also default behavior.

... Well, crap. Thank you, I'll change it today around night.

Can you describe what a md5 and db are? md5 is more for backward compatibility with Wulfre's version. It just makes filenames like <id>.<md5>.<id>.<extension>

db just stores all info about all posts, like link to download, tags, artist and so on. With ;post_from = db in [Defaults] or per-section basis you can recreate your folders with different structure or with fewer folder or with stricter filters without iteration over all posts from e621 api (much faster). Still need to check all the tags, though.

Still not sure what do you mean by multiple download. Can you give more details and examples?

also turning hardlink or cache doesnt fix the multiple files downlaod

It shouldn't anyway. hardlink are for saving space In case of lots of copies. cache mostly for restoring folders really fast on change of folder structure/filter restrictions.

huskelord commented 5 years ago

i dont really undersand md5 or database really well but that ok

what i meanis that lets say over a 20 min period in the new program i have a bunch of images downloaded into the everyting folder but with the old program any images that are downloaded into one folder but would also be donwloaded into another would be put into that folder imediatly not when that folder started to be downloaded

this is what it looks like with the new update e621 net Backup 11_16_2019 4_16_49 AM

this is what it looks like with the old update (this one was also runing for less) e621 net Backup 11_16_2019 4_16_33 AM

Also was the database and the md5 turned on by defalt in the old versions (still says they are in the ReadMe too) like cache and hardlink that might be what is diffrent if they were

Edit: I reread the ReadMe and this sound like what the problem is but i noticed it said that i wont be able to use metatags with the databse on what are the meta tags and will that mess with the subfolder system

lurkbbs commented 5 years ago

OK, try disable hardlinks or run as admin with right click --> Run As Administrator or enable Developer mode:

https://www.howtogeek.com/292914/what-is-developer-mode-in-windows-10/

I'll try to test this myself, but my developer mode is already enabled

Edit: Also, yes, now sections are filled one by one. Otherwise max_downloads just won't work. Just to be clear, they were always processed one by one, process time shouldn't change much.

Also, in new config, change

[Everything]
tag = ... -animated
subfolders = Images

[Everything]
tag = ... animated
subfolders = Animated

to

[Everything]
tag = ... 
subfolders = Images Animated

If it worked without crash with config like this, it's also a bug. I'll go checking

Edit2: you config have no metatags. You can safely change [Everything] to [Prefilter]. It can speed things up, but there would be no Everything folder. You can workaround that with:

[Everything]
tag = * 
subfolders = Images Animated

Edit3: Config is indeed with the mistake. Changing now, I guess you already did, just somehow copied wrong version.

huskelord commented 5 years ago

The database worked to get the multiple files at once Also I will use the prefilter but The main reason I had Everyting seperated the way it was was to make sure images were downloaded before animations were Would there be a way to get this to be possible before hand like with prefilter

Edit used the prefilter and got rid of the everyting folder but when dowloading it says it downloading the prefilter section and download images with animations is there a way to download just images before animations

lurkbbs commented 5 years ago

I think I found the reason. Section [Everything] corresponds to no less then third of e621 and will iterate for about four hours alone, much more if you need to download everything there is. Before max_downloads, this should not be a problem, as everything was being placed into all the folders at the same time. Now it's not.

[Prefilter] is the solution I personally use. Yes, I can't use metatags like width:100 or order:random. If that's a problem, I will come up with a solution but so far I don't feel the need to. metatags like rating or score are within standard features. Most can be emulated, like type:jpg. order metatags are impossible to combine with [Prefilter] option anyway.

As I said, your config has no metatags anyway, so this should not be a problem. When it became a problem, let me know.

reg add "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\AppModelUnlock" /t REG_DWORD /f /v "AllowDevelopmentWithoutDevLicense" /d "1"

So, here's config that should be OK: SearchData.txt

It uses hardlinks, so please enable developer mode. Easiest way is to start PowerShell or Commandline as Admin (you can right-click start menu button and select it there) and copy this:

reg add "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\AppModelUnlock" /t REG_DWORD /f /v "AllowDevelopmentWithoutDevLicense" /d "1"

and press Enter and then reboot. This way, copies will occupy about zero bytes on a hard drive.

lurkbbs commented 5 years ago

Ninjae'd.

No, there is no way for now to download images before animation. Essentially, it will need multiple prefilters. If you need it, I'll try to think of something.

Edit: Quick solution should be ready in two hours

lurkbbs commented 5 years ago

Quickfix is here config.ini is here

huskelord commented 5 years ago

So is the only diffrence the extra prefilter in the config.ini or does the quickfix makes the prfilter have an order

Also the reason i wanted images dowloaded first is that way if i run out of disc space its done with images first and i can delet repeats or old versions of the animations

Edit: so it tried it but it wouldn't work just kept closing after verifying all the tags and saying it was in section: prefilter_2

Edit2: I ended up going to the previous version and going back to one prefilter in which i addes -animated to do pictures first which worked after i got rid of post_from = db which i don't know what that evenn does. Edit3: i dont know if it will work but i also added a max_download = 21000000 to try and git it not to stop overnight Edit4: the max_download thing didn't work

lurkbbs commented 5 years ago

First of all, Quickfix is a must. Prefilter_1 and Prefilter_2 won't work otherwise. max_download just limits maximum number of files in a section. No need in that now.

About post_from = db. My mistake. Yes, you are right, just delete this.

I checked twice, this config should work. days = 30, so you can quickly check it for yourself. But please, please tell me if you enable development mode. This is important. It is needed for make_hardlinks and make_hardlinks will save space on your disk. Here, please try it again, everything in this archive. The app and the config are tested and should work. e621dl.zip If not, run with e621_noclose.bat and post error here, please.

Not a single version of e621dl works when connection suddenly changes, not mine, not Wulfre's . That is what most likely is going on. You can test it yourself, just unplug WAN cable from your router and plug again. If e621dl hangs, that is the case.

huskelord commented 5 years ago

i went into the comand thing like you said and coppies what you wrote so i should have it I also changed the properties so that it would always run as admisnistrator Edit1: Everyting seems to be working perfectly dont know for sure if hardlink is working or not since i dont know how to check it but like i said above i copied what you told me to copy and past into the windows comand thing and I also set it up to always be run as administrator.

lurkbbs commented 5 years ago

Yes! Cool! Hallelujah! Checking is easy: if make_hardlinks = true and e621dl.exe is not crashing suddenly, than it works 😄

These sudden freeze at night problem, gimme a week, I'll try to solve it.

huskelord commented 5 years ago

just a reminder it dosn't necesarraly happen just at night and it doesn't close it just stops dont know if that helps or not.

lurkbbs commented 5 years ago

So, possible reason all hangs is a sudden connection hanging while downloading. By default, there is no timeout. Here is version that should crash if connection is taking too long to continue. Too long is 15.5 seconds.

e621dl.zip

Run with e621_noclose.bat. I'm sorry you have to be a tester, but "it works on my machine" and I have to test on yours.

Copy/screenshot commandline of an error, if it happens at all.

huskelord commented 5 years ago

it seems to work but the e621_noclose.bat only opens the dl not run it itself Found this when i came back

status: Downloading files checked tag: all tags are valid posts so far: 960 last file downloaded: downloads/general/solo/images/1109496.jpg current section: prefilter_1 last warning: None so far Exception in api iterator: Traceback (most recent call last): File "site-packages\urllib3\response.py", line 360, in _error_catcher File "site-packages\urllib3\response.py", line 666, in read_chunked File "site-packages\urllib3\response.py", line 598, in _update_chunk_length File "socket.py", line 586, in readinto File "ssl.py", line 1012, in recv_into File "ssl.py", line 874, in read File "ssl.py", line 631, in read socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "site-packages\requests\models.py", line 750, in generate File "site-packages\urllib3\response.py", line 490, in stream File "site-packages\urllib3\response.py", line 694, in read_chunked File "contextlib.py", line 99, in exit File "site-packages\urllib3\response.py", line 365, in _error_catcher urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='e621.net', port=443): Read timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "e621dl.py", line 131, in prefilter_build_index File "e621dl_lib\remote.py", line 195, in get_posts File "site-packages\requests\sessions.py", line 581, in post File "site-packages\requests\sessions.py", line 533, in request File "site-packages\requests\sessions.py", line 686, in send File "site-packages\requests\models.py", line 828, in content File "site-packages\requests\models.py", line 757, in generate requests.exceptions.ConnectionError: HTTPSConnectionPool(host='e621.net', port=443): Read timed out.

lurkbbs commented 5 years ago

it seems to work but the e621_noclose.bat only opens the dl not run it itself

Not sure what it means. e621_noclose.bat just opens e621dl.exe and makes console window stay open after the executable finishes/crushes.

Error means, as I expected, either bad connection in general, faulty router or ISP needs to reconnect every day, which they usually do around 1 or 2 a.m., local time.

With that, welcome to the wonderfully long and amazingly frustrating world of debugging. I'll do a version that retries requests on timeout for 100 times before finally give up. Not sure if it helps, and if not, solving this can take a bit more than a week.

lurkbbs commented 5 years ago

Here, let's try out this:

e621dl.zip

I tested this version by plugging an unplugging network cable, and it all worked. So, fingers crossed and wood knocked.

huskelord commented 5 years ago

Everyting seems to be working well but i wont be able to test it very well since the old version started working for some reason it was probably internet or one of the other things above. besides it wont be a problem once i have downloaded the initial amount since keeping up with it wont take long or filled up all the storage space on my computer

huskelord commented 5 years ago

I figured out what the problem was I have a laptop so it has some settings that a desktop doesn't (i assume your using a desktop) and I had to change some of them to let me do background things like activating hybrid sleep and extending my hybernate limit

lurkbbs commented 5 years ago

Ah. Well, it has these settings. I just changed it to never sleep a long time ago and forgot about this around that time. Good thing is those freezes are most likely cured anyway

lurkbbs commented 5 years ago

New release: https://github.com/lurkbbs/e621dl/releases/tag/v5.6.0

Updated config: config.ini.txt

I basically renamed prefilters to "\<Images>" and "\<Animated>". There are a bit more improved as well