xcompact3d / x3d2

https://xcompact3d.github.io/x3d2
BSD 3-Clause "New" or "Revised" License
3 stars 4 forks source link

Fix a bug in OpenMP distributed tridiagonal solver #114

Closed semi-h closed 1 month ago

semi-h commented 1 month ago

Last week when implementing Thomas on CUDA I realised we use this trick often, but its not really suitable for the OpenMP backend. Thought all was removed when quickly checked last week, haven't noticed this one!

Nanoseb commented 1 month ago

lgtm, but why is it not suitable? I am not sure I understand what was going wrong. Also, if the results produced were wrong it may be a sign we need better tests.

pbartholomew08 commented 1 month ago

temp_du needs to be an array for SIMD access. In the past, I've found compiling with gfortran debug flags enabled catches this.

Nanoseb commented 1 month ago

ah yes of course

pbartholomew08 commented 1 month ago

I agree though, it indicates the tests aren't catching something

Nanoseb commented 1 month ago

This is maybe just a performance issue. The compiler may detect it can't vectorise the loop so it doesn't?

pbartholomew08 commented 1 month ago

Confirming that would require checking assembly/vectorisation reports. I think that the pragma means "vectorise this, ignoring reasons not to", but with strict debug flags enabled it highlights as a warning and or raises an error