pharmaverse / blog

Blogging on the latest, greatest and most spectacular stuff happening around the pharmaverse
https://pharmaverse.github.io/blog/
Apache License 2.0
21 stars 9 forks source link

Blog Post: The Tension of High-Performance Computing: Reproducibility vs. Parallelization #195

Open AleKoure opened 1 month ago

AleKoure commented 1 month ago

Blog Post

Balancing reproducibility and parallelization in data science presents a trade-off due to several factors. Managing consistent RNG states and process synchronization in parallel tasks is challenging, leading to potential inconsistencies. This trade-off exists because parallel computing aims to optimize performance and speed, often at the expense of reproducibility. The need to ensure reliable and reproducible results across different environments further complicates this balance, making it crucial to address these challenges to achieve high-performance computing without compromising data integrity. Tools like mirai in R can help by facilitating asynchronous computation, making it possible to reduce the tension between high-performance computing and reproducibility.

bms63 commented 1 month ago

Hi @AleKoure - if this is written in a way that discusses challenges biostatisticians and statistical programmers face when working with large data, conduct simulations, modeling, etc then sounds a like a really interesting topic. Also, assuming you are thinking about writing this post?

What do you think @manciniedoardo @StefanThoma @gigikenneth @kaz462

manciniedoardo commented 1 month ago

Absolutely! Also would want to tie in discussions around specific pharmaverse packages

AleKoure commented 1 month ago

Great! We had this discussion with @StefanThoma at useR! and he suggested opening an issue here. We may have some simple examples then reframe using e.g. admiral or any other suggestion you have.

StefanThoma commented 2 weeks ago

Hi @AleKoure When do you think you'll have a first draft ready?

AleKoure commented 2 weeks ago

Hi, I'll make a draft by the beginning of Sep, and then we can sync. I'll use a simple simulation of drug efficacy and maybe some RNG, and discuss around {mirai}. If you have any additional inputs, suggestions, or specific points, please feel free to share them :)

On Mon, 19 Aug 2024 at 21:36, StefanThoma @.***> wrote:

Hi @AleKoure https://github.com/AleKoure When do you think you'll have a first draft ready?

— Reply to this email directly, view it on GitHub https://github.com/pharmaverse/blog/issues/195#issuecomment-2297196719, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOUXY3N3ZZ477W2QB7HWU3ZSI3L3AVCNFSM6AAAAABK57FOMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJXGE4TMNZRHE . You are receiving this because you were mentioned.Message ID: @.***>