lurkbbs / e621dl

The automated download script for e621.net. Originally by @wwyaiykycnf.
11 stars 3 forks source link

Pontental Bug with Pools #36

Closed huskelord closed 4 years ago

huskelord commented 4 years ago

OK so i have 4 configs at the moment

0th config is a blank to empty the to_blacklist folder and to pre verify some tags

1st config is to download images that I wan't donloaded (these should be the only images downloaded except missing images of the pools some of those original images are in)

2nd config is to sort the downloaded images in to sections to go to depending on my mood without downloading any new images (in this config is a blacklist section that i use to seperate images that i may want to blacklist form images that I don't) (pools are enabled in this section to help with that)

3rd config is to limit whatever part of the blacklist section that i have fully gone through to only the most recent posts (aka 60 days) that way I don't need to worry about old posts I already gave the OK to (pools are enabled here as well)

The problem is that alot of the pools genreaed in the blacklist section of the 2nd config don't have any images that were downlaoded in the 1st config and thus shouldn't have been downloaded at all

There are also a couple of pools in the 3rd config that should not have been made since they are before the the days amount but this might be because I did'nt delete the pools.genreated and it just created them from that

lurkbbs commented 4 years ago

There are also a couple of pools in the 3rd config that should not have been made since they are before the the days amount but this might be because I did'nt delete the pools.genreated and it just created them from that

That's not it. pools.generated is recreated on each run. Delete it or not, it changes nothing. days_ago on the other hand is broken and is not using current date but just number attached to the post entry in the database. So unless entry is updated, it will be stuck in last days forever. Kinda useful in offline mode, so I postponed fixing this. Looks like it's time.

The problem is that alot of the pools genreaed in the blacklist section of the 2nd config don't have any images that were downlaoded in the 1st config and thus shouldn't have been downloaded at all.

Not sure if I understand. So, in first config, you download all that you want except images in pools. In second, you want to sort what you downloaded from the first. offline probably. But some images from the first that by all account should be in second are not there, correct? That would be strange (read: bug).

If it's the opposite, images in second that are not in first, and you have them in your cache or config is not offline but just db, and files are not in global id blacklist, then they will be redownloaded. If it is offline or files in global id blacklist, and they're still there, it's also a bug.

huskelord commented 4 years ago

Configs.zip This is the configs I have now if it helps

The second config has images that are in the first config this is fine the problem comes when I have images that are in the second config but are not in the first config. Luckaly there are no problems with the images their working fine

Here is an example of the problem

Is if pool 1, 2, and 3 have images would fall into the blacklist section under translated (its in the random folder of specified for blacklisting) since they have a image in them that was translated, they go to that folder and and get the full pool donwloaded This works completly fine

The problem is that every single pool that has the translated tag in any of its posts gets downloaded no mater if the posts were downloaded in the first config or not thus pools that has no posts I wanted and thus are pools I don't want, will be downloaded anyways

lurkbbs commented 4 years ago

the problem comes when I have images that are in the second config but are not in the first config.

Just to make sure, do these images not removed from e621 and now only in your database? I have some of these, they're only available on db/offline mode.

The problem is that every single pool that has the translated tag in any of its posts gets downloaded no mater if the posts were downloaded in the first config or not thus pools that has no posts I wanted and thus are pools I don't want, will be downloaded anyways

OK, too hard for me. Can you PM me on e621 and tell me and id of at least one misplaced post, where it's placed now and where it's supposed to be (or not supposed). By the way, that's the perfect format for the future, post problem here, PM me an example like that.

huskelord commented 4 years ago

will do

sent

lurkbbs commented 4 years ago

Thanks. I got it, tomorrow I'll get to the root cause, this feels like the most serious bug

lurkbbs commented 4 years ago

Ah. I finally understand it.

So, you're never download any of the post, and you're never have any image from there (well, I guess you are now), and in offline mode you shouldn't have any post but somehow pools where generated. Sadly, that's how it's supposed to be.

See, decision to make a new pool happens before attempting to download everything. It's based on a post info from db, that updates from api based on prefilters, and the pool you post me passes your prefilter.

More than that, any folders to download post to are made in advance, before e621dl even tries to get any files. And naturally, decision to generate pool folders happens before any actual downloads.

So, here's what happens:

  1. At some point in config one, info on some posts in the pool are placed in the local db.
  2. At some point in config two, post processed and: 2.1. New folders created for the pool 2.2. Those folders adds to a list of pools to downloads 2.3. Since there is no image for a post and it's offline, it's not downloaded at all, but not removed from pools to download from generated config.
  3. Naturally, these files will be downloaded later on with config.generated
  4. Finally, all empty folders are removed.

So, everything is as it should be, but granted, not as it expected to be.

Maybe I could do something about it, but if you it's enough to just exclude folder with existing pool_post_strategy = None or on sometimes during next week I can move pool_download_generate from global settings to per-section setting + defaults, that would be preferable.

huskelord commented 4 years ago

Got it. Well if moving them to per-section settings help with the problemninwould appreciate it otherwise I'll just work around it.

lurkbbs commented 4 years ago

Should help, I'll try to do it tomorrow then

lurkbbs commented 4 years ago

OK, this should work e621dl.zip

move pool_download_generate = true from [settings] to [defaults] and add pool_download_generate = false wherever you don't want folder to to be in generated config but still want to sort image to pools. As always, subfolders are not considered proper folders.

huskelord commented 4 years ago

quick qestion would putting pool_download_generate = false in the [defaults] keep all pools from generating but pools will still be made just no new images added to them

huskelord commented 4 years ago

It worked perfectly by the way. feel free to close the issue after replying to the qestion above

lurkbbs commented 4 years ago

quick qestion would putting pool_download_generate = false in the [defaults] keep all pools from generating but pools will still be made just no new images added to them

Yes. Generation is a separate step from sorting to pools. Also, you can just not put pool_download_generate in [defaults] at all, it's false by default anyway. And you can add pool_download_generate=true only to the search sections you want to be in generated config.