starik222 / BooruDatasetTagManager

MIT License
1.45k stars 124 forks source link

Add the python image interrogator code (only) #77

Closed fake-name closed 1 year ago

fake-name commented 1 year ago

Ok, here is a PR of just the python interrogator codebase from my other branch.

I have added the change where now you can specify a set of interrogators to use.

This still has the "makes internet requests" on use issue. I'm not entirely sure why that request isn't cached (it caches the actual network, but not the labels?). In any event, that can be fixed separately.

fake-name commented 1 year ago

Whoops, didn't realize you'd cherry-picked commits.

fake-name commented 1 year ago

Note that this will require c# changes, since you can now specify more then one interrogator to use per-rpc-call.

I'm also not keeping only one network loaded into GPU ram, since for multi-network requests, the network churn winds up being quite slow. This may cause issues if someone tries to use all networks at once on a machine without too much VRAM. Ideally, I'd like to try keeping everything in VRAM, and setting a runtime flag if it fails at which point things would be loaded sequentially.

starik222 commented 1 year ago

Thanks for your work! I modified the code to support multiple model selection. Also fixed cache ignoring by WD tagger. As far as I understand, when multiple models are selected, the results are “added” and duplicates are removed. When you wrote about multiple selection of models, I saw it as “multiplication”, that is, displaying only those tags that are present in each model. If you want to improve interrogator, it would be good to implement both options, which would give a clearer understanding of how multiple model selection works and more flexibility.

fake-name commented 1 year ago

Oooh, that's an interesting idea.

I definitely played with the idea of just returning the results for each model separately. The implementation as-it-is is primarily how I wanted to use the thing.

As far as I understand, when multiple models are selected, the results are “added” and duplicates are removed. When you wrote about multiple selection of models, I saw it as “multiplication”, that is, displaying only those tags that are present in each model. If you want to improve interrogator, it would be good to implement both options, which would give a clearer understanding of how multiple model selection works and more flexibility.

I think that's a case of words being valid multiple ways. What you're describing is effectively a set intersection, whereas what I implemented is a set addition.

I'm happy to implement it either by returning the result from each network as a separate array, or with some sort of flag to dictate which summation mechanism to use. Let me know what you'd prefer me to do.

I've also been working on adding other additional parameters, like the ability to avoid network requests (see https://github.com/fake-name/BooruDatasetTagManager/commits/interrogator-only), as well as a pyinstaller script to generate distributable binaries. A lot of it isn't finished, as I've been mostly thinking about what the gRPC API structure should look like.

I also added a flag to the API to control how the VRAM allocation is managed. The python stuff tries to recover from OOM, but it doesn't seem to work reliably. I think a lot of CUDA stuff basically just gives up if it hits a single OOM, and you have to restart the application entirely to clear the issue.

starik222 commented 1 year ago

If interrogator returned an array with the results of all models, it would be much more convenient. Then it would be possible to operate on the results before outputting them. I'm not good at Python, so I don't know the disadvantages of binary assembly, but I had another idea. I was thinking of creating two .bat files with commands to start the service, one for pure Python, the other for Anaconda, and in the BDTM itself in the settings, make it possible to select one of these batch files to automatically start the service if it was not possible to connect to it. As a result, the user only needs to configure the service once by installing dependencies, after which everything will be done almost automatically...

fake-name commented 1 year ago

If interrogator returned an array with the results of all models, it would be much more convenient. Then it would be possible to operate on the results before outputting them.

That's a easy change to make. The only reason I was putting everything together before returning them over the RPC interface was because I know python better then C#, so it was easier for me to do it there.

I'm not good at Python, so I don't know the disadvantages of binary assembly, but I had another idea. I was thinking of creating two .bat files with commands to start the service, one for pure Python, the other for Anaconda, and in the BDTM itself in the settings, make it possible to select one of these batch files to automatically start the service if it was not possible to connect to it. As a result, the user only needs to configure the service once by installing dependencies, after which everything will be done almost automatically...

When I say pyinstaller/binary, I mean a folder containing a executable and all of the dependencies needed to run the interrogator that can be redistributed. You'd just run the exe and the interrogator service will start. No installation of anything needed.

PyInstaller bundles a Python application and all its dependencies into a single package. The user can run the packaged app without installing a Python interpreter or any modules.

It about works now, albeit with some DLL issues I've not finished tracking down.

The major issue currently is that pyinstaller tends to be overeager about which DLLs to bundle, and the CUDA dlls are ENORMOUS. The resulting output is currently ~8GB! The python tensorflow package alone is 1 GB, and pytorch is 5GB. Both are bundled, though I may be able to trim the dependencies with some work.

Anyways, I should be able to get at least the api changes done this weekend.

starik222 commented 1 year ago

The major issue currently is that pyinstaller tends to be overeager about which DLLs to bundle, and the CUDA dlls are ENORMOUS. The resulting output is currently ~8GB! The python tensorflow package alone is 1 GB, and pytorch is 5GB. Both are bundled, though I may be able to trim the dependencies with some work.

I think there is no point in such a “package”. At a minimum, because due to its size there is simply nowhere to place it, and it is also difficult to update it. Still, it would be best to limit ourselves to creating batch files for installation and launch.