Closed LaurentPlagne closed 3 years ago
Hi Laurent,
Congrats and many thanks for your super impressive and useful package !
Thanks for your enthusiastic feedback 😄
Regarding your questions:
Is there a doc/videos/forum threads explaining the internal design choices ?
Not yet. However, some early design stages where presented at JuliaCon2019. The package's main current features where presented at JuliaCon2020.
[...] about the internal data layout of fields: are they organized to adapt to the cache hierarchy and SIMD width of computing targets ? Do you think it would make sense ?
Currently ParallelStencil.jl relies on the standard Julia array CuArray data layouts. We plan to implement different advanced optimisations which will also include some changes to the data layout.
The README states that time blocking should not be interesting for real applications. Could you elaborate ? I thought that small blocks with halo may perform several time-steps before communication and may help to reduce the memory bandwidth pressure. Do you think it would make sense ?
We are not aware of time blocking implementations in complex applications delivering significant speedup. It doesn't exclude that if you are tuning a specific application to its limits, time blocking can give you some benefit.
We are closing this issue for offline discussion.
Congrats and many thanks for your super impressive and useful package !
I have a few questions :
Is there a doc/videos/forum threads explaining the internal design choices ?
I was wondering about the internal data layout of fields: are they organized to adapt to the cache hierarchy and SIMD width of computing targets ? Do you think it would make sense ?
The README states that time blocking should not be interesting for real applications. Could you elaborate ? I thought that small blocks with halo may perform several time-steps before communication and may help to reduce the memory bandwidth pressure. Do you think it would make sense ?
Best,
Laurent