rstudio / pointblank

Data quality assessment and metadata reporting for data frames and database tables
https://rstudio.github.io/pointblank/
Other
869 stars 56 forks source link

`data.table` syntax in preconditions #303

Open randrescastaneda opened 3 years ago

randrescastaneda commented 3 years ago

Prework

Proposal

Dear all,

Is it possible to use data.table syntax in preconditions? Two reasons for this. First, I already had written some code checking my data before learning about pointblank, but all these tests were written in data.table syntax. With the preconditions argument, it would extremely easy to incorporate them into the whole data validation process. Second, when checking big data, being able to use data.table will highly speed up the process.

I tried something like the code below, but it failed with the following message, Error in .(un = unique(survey_comparability)) : could not find function "."

 col_vals_equal(vars(diff),
                 value = 1,
                 preconditions = ~.[, 
                                    .("un" = unique(survey_comparability)),
                                    by = .(country_code, survey_coverage)
                                   ][
                                     order(country_code, un)
                                   ][, 
                                     diff := un - shift(un), 
                                     by = .(country_code) 
                                   ][
                                     !is.na(diff)
                                   ]
  )

Thank you so much. Best,

rich-iannone commented 3 years ago

This might not work currently but I’m interested in testing this out with data.table objects and its syntax in preconditions.