muslll / neosr

neosr is a framework for training real-world single-image super-resolution networks.
https://github.com/muslll/neosr
Apache License 2.0
136 stars 28 forks source link

Modifications for CRAFT, SwinIR, and DCTLAS are not detectable #64

Closed RunDevelopment closed 4 months ago

RunDevelopment commented 4 months ago

Hey musl. I finally found some time to make spandrel compatible with your modified architectures, and it's not looking good. ATD was easy to detect and support because of the no_norm trick, but the others are hopeless. The state dicts for CRAFT and SwinIR are the same whether flash attention is used or not, and the dropout you added to DCTLAS also can't be detected.

This unfortunately means that the pth files of all CRAFT and SwinIR models trained with flash attention are essentially useless. The issue is that spandrel will see them as regular CRAFT and SwinIR model and silently use the wrong inference code, which leads to decreased performance. So the models will work, but they will produce worse outputs. It's the worst kind of unsupported models.

I'm not sure whether the modifications to DCTLAS are a problem. You just added a nn.Dropout2d(p=0.5) here. I don't know much about these things, so I'm not sure whether this will even affect the inference result.

What to do now

Going forward, I think the best option would be to use the no_norm trick for flash attention in CRAFT and SwinIR. If we add a flash_attention tensor when flash attention is enabled, spandrel can detect this and then use the correct inference code.

About DCTLAS: I'm not sure whether we even need to do anything, so I would like to hear your thoughts on the matter. If the dropout does affect inference results, then we need to use the same trick again.


Related to #61 and https://github.com/chaiNNer-org/spandrel/issues/230

muslll commented 4 months ago

I see. On newer network were I can implement flash attention I'll see about adding the tensor to make detection easier.

models trained with flash attention are essentially useless

Well, not true. It's not useful in the chainner/spandrel context, but people can still use it inside neosr, with the test.py script.

About DCTLAS: I'm not sure whether we even need to do anything

I haven't tested, but technically as long as dropout is 0 at inference time, it shouldn't affect the results. All it does is drop some channels during training, it's a common technique to increase generalization, but it shouldn't affect inference if it is set to 0.

RunDevelopment commented 4 months ago

On newer network were I can implement flash attention I'll see about adding the tensor to make detection easier.

Thanks musl!

It's not useful in the chainner/spandrel context, but people can still use it inside neosr, with the test.py script.

Yes, but that's not the whole picture. These models can only be used by neosr. Any program/library that implements the official arch code cannot use them (correctly). So I meant to say that they are not fit for general use.

I haven't tested, but technically as long as dropout is 0 at inference time, it shouldn't affect the results. All it does is drop some channels during training, it's a common technique to increase generalization, but it shouldn't affect inference if it is set to 0.

Thanks for the info! I just looked at how nn.Dropout2d is implemented, and it's a noop/identity function during evaluation. So neosr DCTLAS models are perfectly compatible with official arch code.

muslll commented 4 months ago

I just looked at how nn.Dropout2d is implemented, and it's a noop/identity function during evaluation. So neosr DCTLAS models are perfectly compatible with official arch code.

Nice! Thanks for looking into it.

RunDevelopment commented 4 months ago

Please reopen this issue @muslll. There is still no way to detect whether CRAFT and SwinIR use flash attention.

muslll commented 4 months ago

I'll fix CRAFT soon. As for SwinIR, I'll just leave it as it is, as it was more like an experiment to test flash attention.

RunDevelopment commented 4 months ago

I'll fix CRAFT soon.

👍 Thanks!

As for SwinIR, I'll just leave it as it is, as it was more like an experiment to test flash attention.

Then I would suggest removing the commented-out options for flash attention in the configs of SwinIR (1, 2, 3). If the feature is experimental, then this option (even if disabled) shouldn't be included in user-facing configs IMO.

RunDevelopment commented 4 months ago

Any updates on this?

muslll commented 4 months ago

Hi, not yet. Soon