pharmaverse / blog

Blogging on the latest, greatest and most spectacular stuff happening around the pharmaverse
Apache License 2.0
19 stars 7 forks source link

Blog Post: De-Mystifying R Programming in Clinical Trials #143

Closed NavitasLifeSciences closed 2 months ago

NavitasLifeSciences commented 3 months ago

Blog Post

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="" xmlns="">

De-Mystifying R Programming in Clinical Trials

Venkatesan Balu, Associate Director, Global Data Sciences, Navitas Life Sciences

The use of R programming in clinical trials has not been the most popular and obvious. Despite experiencing significant growth in recent years, the adoption of R programming in clinical trials is not as widespread and evident as anticipated. Practical implementation faces obstacles due to various factors, including occasional misunderstandings, particularly in the context of validation, and a notable lack of awareness regarding its capabilities. However, despite these challenges, R is steadily establishing a growing niche within the pharmaceutical industry.

Opportunities for R Programming in Clinical Trials


Although R is versatile and applicable in various settings, it is commonly associated with scientific computing and statistics. In the context of clinical trials, where researchers aim to understand and enhance drug development and testing processes, R has become a prominent tool for analyzing the collected data. While SAS has been a longstanding programming language for clinical trials, its high cost prompts the industry to explore better alternatives. Therefore, there is a quest for sustainable technology and tools that can effectively address industry challenges.

To drive innovation, there is a need to move away from traditional, inefficient processes and tools toward solutions that are efficient, simple, easy to implement, reliable, and cost-effective. Collaboration among industry stakeholders is crucial to develop a robust technology ecosystem and establish consensus on validation and regulatory benchmarks. Equally vital is preparing the workforce with the necessary skillsets to meet future demands.

Current Usage Trends of R


Analyzing the current trends of R in the pharmaceutical industry reveals that its usage is currently below par in activities related to Pharma Regulatory Submissions. However, R finds extensive use in public health projects, healthcare economics, exploratory and scientific analysis, trend identification, generating plots/graphs, specific statistical analysis, and machine learning. R continues to advance steadily in clinical trials, however lacks widespread usage within the clinical space. The notable difference between SAS and R is that SAS is proprietary software, whereas R is an open-source programming language.

SAS or R Programming- Which is Better?


The ongoing debate in the programming community revolves around whether to replace SAS with R, use both, or explore other alternatives like Python. Instead of adopting an either-or scenario, leveraging the strengths of each programming language for specific Data Science problems is recommended, recognizing that one size does not fit all. Early adopters of R have faced challenges, with regulatory compliance for R packages being a common issue. For R to be considered for tasks related to regulatory submission, a rigorous risk assessment of R packages, feasibility analysis, and the establishment of processes for R usage through pilot projects with necessary documentation become imperative.

Benefits of Using R Programming


R, as a language and environment for statistical computing and graphics, possesses characteristics that make it a potentially powerful tool for Data Analysis. With approximately 2 million users worldwide and three decades of legacy, R stands out as open-source software receiving substantial support from the community. Its availability under the GNU General Public License and extensive documentation contribute to its strength. R is versatile, running on various platforms, offering a wide array of statistical and graphical techniques, and its ease of producing publication-quality plots enhances its appeal.

The pharmaceutical industry has witnessed the emergence of various R packages tailored for Clinical Trial Design, Monitoring, and Analysis. Examples include Atable for creating tables for reporting clinical trials, compareODM for comparing medical forms in CDISC ODM format, CRTSize for sample size estimation in cluster randomized trials, and others. These packages cater to different aspects of clinical trial data analysis, showcasing the versatility of R in this domain.

This article talks more about use of r in clinical trials and how this will be used by taking advantages of open source of R.  The FDA emphasizes the need for fully documenting software packages used for statistical analysis in submissions. The use of R poses specific challenges related to validation, given its free and open-source nature. To address this, the R Foundation has released guidance documents focusing on regulatory compliance, validation issues, and the software development life cycle. However, this will not have problems in implementing validation of the deliverables or any integration with the tools for the statistical analysis.

Implementing Dual Programming


As a procedural measure, we can implement dual programming, where primary programmers focus on the SAS system for deliverables while validators utilize an alternative program, such as R programming. As part of demographic table validation, we generated the table using both SAS and R programming.

Table 01 --


Benefits of Using R Programming


Given that the cost of the R-package is non-chargeable, it can also serve as a potential tool for API integration. For instance, in signal detection, R packages can prove to be valuable tools due to the intricate derivation process for EBGM in the Bayesian approach, which aims to mitigate false positive signals resulting from multiple comparisons. The computation adjusts the observed-to-expected reporting ratio for temporal trends and confounding variables such as age and sex. While both methods can estimate this, the accessibility of R as free software enables easy integration into any system as an API or for macro estimation purposes without any copyrights issue.

Identifying the Limitations in Using R- Programming


It is crucial to note that software cost is essential to any company, including pharma and biotech ones. While R and RStudio® are free and SAS® requires an annual license, using R instead of SAS may not always lower costs. The cost of software is only one part of the equation. To be used in a highly regulated industry such as pharmaceuticals, software validation, maintenance, and support are critical, and their costs need to be considered. Although R is free and open source, it comes with a steep learning curve, lacks direct support from the company, and faces a shortage of R programmers compared to those familiar with SAS®.

Leveraging the Right Tools to Capture Value


Capturing the value of R programming starts with a clear vision for its use and a systematic approach to identifying and prioritizing the needs in the industry. Clinical Data Science is evolving rapidly, and the industry actively seeks alternative solutions to unlock valuable insights from diverse datasets. Recognizing the need for innovation, collaboration, and efficient tools is crucial. Rather than viewing SAS, R, and Python as mutually exclusive, leveraging the strengths of each for appropriate Data Science problems provides a nuanced and effective approach.

Ensuring data quality, scientific integrity, and regulatory compliance through risk assessment frameworks, validation, and documentation are imperative in this dynamic landscape. The pharmaceutical industry's journey toward embracing R reflects the broader trend of industries recognizing the value and potential of open-source tools in addressing complex challenges.


A person in a blue suit

Description automatically generated

Venkatesan Balu is the Associate Director, Global Data Sciences, Navitas Life Sciences with 15+ years of experience in the Biostatistics domain, and in Phase I to Phase IV Clinical Trials across various therapeutic areas, BABE and PK studies. He has invaluable expertise in providing inputs to study design, sample size, SAP, outlier evaluation, interim analysis, complex statistical evaluation & model selection, and regulatory requirement. He is a technical leader in drug development strategy, adaptive design, portfolio optimization, and decision-making in clinical trials.


rossfarrugia commented 3 months ago

As discussed with authors over mail, I will convert this to quarto, add a sentence on pharmaverse and make a PR.