Open solarorb93 opened 2 years ago
Torch is awesome because it is a one-click CUDA install from the pytorch website https://pytorch.org/get-started/locally/. However, the NudeNet model is not available in anything other than onnx. Tried several methods to convert model from onnx to pytorch, none worked (onnx-pytorch, onnx2pytorch, onnx2keras). It seems like the NudeNet onnx model is very poorly behaved, has node names and types that are not supported.
TensorRT - again, model could not be loaded into tensorRT (using the tensorrt onnx provider, have not tried tensorrt native)
After I found your project I was inspired to do something similar to BetaVision. Instead of the output in a separate window, I wanted to embed a new window inside a target window and render over it completely with the censored output. It's possible under X11 on Linux, I'm pretty sure it's possible on Windows.
However I wanted it to be cross-vendor (I have AMD), and the only runtime I'm aware of that would work for basically all GPUs is NCNN since it has a Vulkan backend. onnxruntime may have a ROCm EP, but it's not built by default and AMD hardly supports ROCm anyways. There's a merge request for an OpenCL EP here, it's been dormant for a few months though and is pretty incomplete.
The progression would have been ONNX > pytorch > Torchscript > PNNX > NCNN. Although I was able to get onnx2pytorch to import the ONNX model after adding the Size operator and commenting something out, I never got any further. onnx2pytorch isn't compatible with Torchscript as it is, you'd have to rewrite the whole thing and I didn't want to. I couldn't get pytorch to save the imported model either.
Someone's recently started working on a Vulkan EP for onnxruntime here, but it's in the very early stages.
I don't know, I just wanted to share since I had similar struggles. I don't really know what to do from here aside from creating my own version of the model, I've no idea what goes into that.
Thanks for the ideas! It definitely sounds like you're a bit more familiar with a lot of neutral net stuff than me.
One of the clumsiest things about BetaVision is the need for the uncensored content to be exposed, so that I can grab it with mss and send it to the net. I've done some basic investigation into how that might be avoided but I wasn't able to come up with anything that did exactly what I wanted. I didn't even find a way to grab the contents of a specific window.
I know it must be possible, because OBS does it. But I haven't found any way for someone like me, who is a very very amateur coder, to recreate it.
It definitely sounds like you're a bit more familiar with a lot of neutral net stuff than me.
Oh god no, I have absolutely no idea what I'm doing. This is the first time I've ever touched neural net anything ever. That was all from trying to turn that ONNX model into an NCNN model over the past ~10 days. I tried converting the Tensorflow model too, had even less luck there. I couldn't even get it to load properly in the version it's supposed to be supported under. OpenCV didn't want to read either of the models.
\ \ I haven't looked at the code yet, does BetaVision currently just copy a region on the screen to process and then display in another window ?
Unfortunately I feel the only way forward for user experience would be recreating the model for NCNN or something else that isn't vendor-locked. Perhaps multiple models, each responsible for detecting a specific feature ? It feels counterintuitive to me, lumping all of them together like in NudeNet.
Maybe you'd be interested in working on that together ? Again, no idea what that entails, it could be a big dead-end for all I know. It's such a novel thing, I want it to work !
Yes, that's exactly how BetaVision works.
As far as I know, the training code for NudeNet is open source, but the actual training data is not.
On Wed, Aug 17, 2022, 9:56 PM hostilesponge @.***> wrote:
It definitely sounds like you're a bit more familiar with a lot of neutral net stuff than me.
Oh god no, I have absolutely no idea what I'm doing. This is the first time I've ever touched neural net anything ever. That was all from trying to turn that ONNX model into an NCNN model over the past ~10 days. I tried converting the Tensorflow model too, had even less luck there. I couldn't even get it to load properly in the version it's supposed to be supported under. OpenCV didn't want to read either of the models.
I haven't looked at the code yet, does BetaVision currently just copy a region on the screen to process and then display in another window ?
Unfortunately I feel the only way forward for user experience would be recreating the model for NCNN or something else that isn't vendor-locked. Perhaps multiple models, each responsible for detecting a specific feature ? It feels counterintuitive to me, lumping all of them together like in NudeNet.
Maybe you'd be interested in working on that together ? Again, no idea what that entails, it could be a big dead-end for all I know. It's such a novel thing, I want it to work !
— Reply to this email directly, view it on GitHub https://github.com/solarorb93/BetaSuite/issues/37#issuecomment-1219038188, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZSKWA3JDGFSMIBVC7WP4TDVZW66RANCNFSM55S2LP3A . You are receiving this because you authored the thread.Message ID: @.***>
After sifting through the issues in the NudeNet repository I found:
I did some experimenting converting the ResNet50 model from Torchvision to ONNX and then NCNN and it was successful, so it's a starting point at least.
As far as you can tell, does that work with dynamic size inputs like nudenet currently supports?
I've done a lot of reading today and to be honest I don't know. I think it should but I don't know if that's true. I'm seeing that it expects inputs of 224x224 RGB, although an AdaptiveAvgPool2d at the end should allow any size.
I could be totally wrong, I'm still very much fumbling around in the dark.
I can say for certain that inference time is roughly linear in total number of pixels, and that it's not just a matter of nudenet scaling it down - the inference is materially different if I send in a big or small version of the same picture.
I noticed that too while playing around with that other project, the browser extension that connects to a server you run. Smaller images were much faster but were also usually unaffected.
I'm getting closer to a plan for building a new detector. I'm not excited about creating training data for it. I was looking into automatic options but there's no way to avoid having to create an initial dataset to start from.
Sounds exciting! Let me know if I can help. If a new detector project gets going in earnest I'll work on making the detection code into a more abstract class so it can be switched out.
Onnxruntime is horrible for user experience. When CUDA is not configured correctly, it gives no useful output to help figure out what is wrong.