rstudio / pointblank

Data quality assessment and metadata reporting for data frames and database tables
https://rstudio.github.io/pointblank/
Other
862 stars 56 forks source link

scan_data fails with spark table #483

Open joscani opened 1 year ago

joscani commented 1 year ago

Hi. First of all. Thanks for your fantastic package.. It allows me to validate my spark dataframes and create beautiful reports.

I'm trying to use scan_data function over spark data frame. But I get an error

Example with iris dataset copy to spark


> iris_tbl <-  sc |> copy_to("iris_spa", df=iris)
> iris_tbl
# Source: spark<iris_spa> [?? x 5]
   Sepal_Length Sepal_Width Petal_Length Petal_Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
 1          5.1         3.5          1.4         0.2 setosa 
 2          4.9         3            1.4         0.2 setosa 
 3          4.7         3.2          1.3         0.2 setosa 
 4          4.6         3.1          1.5         0.2 setosa 
 5          5           3.6          1.4         0.2 setosa 
 6          5.4         3.9          1.7         0.4 setosa 
 7          4.6         3.4          1.4         0.3 setosa 
 8          5           3.4          1.5         0.2 setosa 
 9          4.4         2.9          1.4         0.2 setosa 
10          4.9         3.1          1.5         0.1 setosa 
# ℹ more rows
# ℹ Use `print(n = ...)` to see more rows
> iris_tbl |> scan_data(sections = "OV")

── Data Scan started. Processing 2 sections. ───────────────────────────────
ℹ Starting assembly of 'Overview' section...
✔ ...Finished!  (1.4 s)
ℹ Starting assembly of 'Variables' section...
Error in `summarise()`:
ℹ In argument: `p05 = (structure(function (..., .x = ..1, .y = ..2,
  . = ..1) ...`
Caused by error:
! objeto 'Sepal_Length' no encontrado
Run `rlang::last_trace()` to see where the error occurred.

any idea ? Thanks

rich-iannone commented 1 year ago

Thank you for filing this report. I will look in this!