ropensci-archive / cleanEHR

:warning: ARCHIVED :warning: Essential tools and utility functions to facilitate the data processing pipeline, data cleaning and data analysing of clinical data from CC-HIC
GNU General Public License v3.0
54 stars 23 forks source link

Variables MAP, SBP and DBP have unequal counts #117

Closed abhishekdxt closed 7 years ago

abhishekdxt commented 7 years ago

Hi,

I have noticed that the variables such as MAP, Systolic and Diastolic from the Critical Care Datathon have mismatching frequencies.

The R code below shows the count of MAP, Systolic and Diastolic for each episode.

t( data.frame( sapply(as.integer(ccd@infotb$episode_id), function(x) c( length(ccd@episodes[[x]]@data[['NIHR_HIC_ICU_0110']]$time), length(ccd@episodes[[x]]@data[['NIHR_HIC_ICU_0113']]$time), length(ccd@episodes[[x]]@data[['NIHR_HIC_ICU_0114']]$time) )),check.names=F ) )

365 episodes have missing Systolic measures where as, MAP and Diastolic is missing for 73 episodes.

sinanshi commented 7 years ago

Hi, Data in the ccRecord objects is the raw data from the machine measurement directly which is not necessarily consistent in terms of the frequency. If you want to align the MAP, systolic and diastolic in a table with the same length, maybe you can do,

# create a table in 1-hour cadence. The conf syntax is a bit wierd since it was designed for yaml.
create.cctable(ccd, conf = list(NIHR_HIC_ICU_0110 = list(), NIHR_HIC_HIC_0113=list()), freq=1)

Did I answer you question?

abhishekdxt commented 7 years ago

Hi Sinanshi,

Sure, I used your code to align the sequences and it does well in aligning the variables using the time of their capture. But the point I am trying to make here is about the original/untransformed dataset. I would expect relational variables like Systolic, Diastolic and MAP to have approx. same frequency but that is not the case here (e.g. 365 episodes have no Systolic measures and Diastolic is absent only for 73 episodes).

Thanks

sinanshi commented 7 years ago

Case solved.