hrbrmstr / sergeant

:guardsman: Tools to Transform and Query Data with 'Apache' 'Drill'
https://hrbrmstr.github.io/sergeant/
Other
126 stars 13 forks source link

SQL Statement error #14

Closed hermandr closed 6 years ago

hermandr commented 6 years ago

Hi, In going through your code in https://rud.is/rpubs/yelp.html#

I am unable to run this code:

tbl(db, "(SELECT b.name AS name, b.hours[5] AS sat_hrs FROM dfs.d.`/yelp/yelp_academic_dataset_business.json` b)") %>%
  filter(!is.na(sat_hrs)) %>%
  filter(grepl("Saturday", sat_hrs)) %>%
  select(name, sat_hrs) %>%
  collect() %>%
  mutate(sat_hrs = gsub("Saturday ", "", sat_hrs)) %>%
  tidyr::separate(sat_hrs, c("open", "close"), "-") %>%
  filter(open != "0:0" & close != "0:0")

tbl(db, "(SELECT b.name AS name, b.hours[5] AS sat_hrs FROM dfs.d.`/yelp/yelp_academic_dataset_business.json` b)")
returns an error
Error: is.data.frame(df) is not TRUE

tbl does not work with a SQL statement.

Herman

hrbrmstr commented 6 years ago

It sure does :-) I use it every day that way.

Can you:

I'll be able to triage a bit better with that info. -thx

hermandr commented 6 years ago

Here is my sessionInfo

`R version 3.4.3 (2017-11-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale: [1] LC_COLLATE=English_Singapore.1252 LC_CTYPE=English_Singapore.1252
[3] LC_MONETARY=English_Singapore.1252 LC_NUMERIC=C
[5] LC_TIME=English_Singapore.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] bindrcpp_0.2 forcats_0.3.0 stringr_1.3.0 purrr_0.2.4
[5] readr_1.1.1 tidyr_0.8.0 tibble_1.4.2 ggplot2_2.2.1
[9] tidyverse_1.2.1 sergeant_0.5.2 dbplyr_1.2.1 dplyr_0.7.4
[13] RJDBC_0.2-7 DBI_0.8 rJava_0.9-9

loaded via a namespace (and not attached): [1] reshape2_1.4.3 haven_1.1.1 lattice_0.20-35 colorspace_1.3-2 [5] htmltools_0.3.6 yaml_2.1.17 rlang_0.2.0 pillar_1.2.1
[9] foreign_0.8-69 glue_1.2.0 modelr_0.1.1 readxl_1.0.0
[13] bindr_0.1 plyr_1.8.4 munsell_0.4.3 gtable_0.2.0
[17] cellranger_1.1.0 rvest_0.3.2 psych_1.7.8 knitr_1.20
[21] curl_3.1 parallel_3.4.3 broom_0.4.3 Rcpp_0.12.16
[25] scales_0.5.0 jsonlite_1.5 mnormt_1.5-5 hms_0.4.1
[29] digest_0.6.15 stringi_1.1.6 grid_3.4.3 cli_1.0.0
[33] tools_3.4.3 magrittr_1.5 lazyeval_0.2.1 crayon_1.3.4
[37] pkgconfig_2.0.1 xml2_1.2.0 lubridate_1.7.2 assertthat_0.2.0 [41] httr_1.3.1 rstudioapi_0.7 R6_2.2.2 nlme_3.1-131
[45] compiler_3.4.3
`

hermandr commented 6 years ago

This is the R code:

yelp_biz_hours_tbl <- tbl(db, "(SELECT b.name AS name, b.hours[5] AS sat_hrs FROM dfs.data.`/yelp/business.json` b)")

In drill web UI, query and planning / query tab:

SELECT * FROM  (SELECT  b.name  AS  name,  b.hours[5]  AS  sat_hrs  FROM  dfs.data.`/yelp/business.json`  b)  LIMIT 1

In drill web UI, query and planning / errors tab:

SYSTEM ERROR: Drill Remote Exception
Failure node: LAPTOP-VESBNEG3:31010

Error ID: dacc70ea-35c5-458b-9471-f1d596cb7c4e
hermandr commented 6 years ago

From sqlline.log in drill/log

2018-03-30 01:16:49,969 [2542e07e-2127-a42f-7304-cc365604dcef:foreman] INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query id 2542e07e-2127-a42f-7304-cc365604dcef: SELECT * FROM  (SELECT  b.name  AS  name,  b.hours[5]  AS  sat_hrs  FROM  dfs.data.`/yelp/business.json`  b)  LIMIT 1
2018-03-30 01:16:49,973 [2542e07e-2127-a42f-7304-cc365604dcef:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-03-30 01:16:49,973 [2542e07e-2127-a42f-7304-cc365604dcef:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-03-30 01:16:49,973 [2542e07e-2127-a42f-7304-cc365604dcef:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-03-30 01:16:49,973 [2542e07e-2127-a42f-7304-cc365604dcef:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-03-30 01:16:49,973 [2542e07e-2127-a42f-7304-cc365604dcef:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-03-30 01:16:49,973 [2542e07e-2127-a42f-7304-cc365604dcef:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-03-30 01:16:49,973 [2542e07e-2127-a42f-7304-cc365604dcef:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-03-30 01:16:49,988 [2542e07e-2127-a42f-7304-cc365604dcef:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-03-30 01:16:49,988 [2542e07e-2127-a42f-7304-cc365604dcef:foreman] INFO  o.a.d.e.s.schedule.BlockMapBuilder - Get block maps: Executed 1 out of 1 using 1 threads. Time: 0ms total, 0.500011ms avg, 0ms max.
2018-03-30 01:16:49,988 [2542e07e-2127-a42f-7304-cc365604dcef:foreman] INFO  o.a.d.e.s.schedule.BlockMapBuilder - Get block maps: Executed 1 out of 1 using 1 threads. Earliest start: 1.058000 ?s, Latest start: 1.058000 ?s, Average start: 1.058000 ?s .
2018-03-30 01:16:50,004 [2542e07e-2127-a42f-7304-cc365604dcef:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 2542e07e-2127-a42f-7304-cc365604dcef:0:0: State change requested AWAITING_ALLOCATION --> RUNNING
2018-03-30 01:16:50,004 [2542e07e-2127-a42f-7304-cc365604dcef:frag:0:0] INFO  o.a.d.e.w.f.FragmentStatusReporter - 2542e07e-2127-a42f-7304-cc365604dcef:0:0: State to report: RUNNING
2018-03-30 01:16:50,062 [2542e07e-2127-a42f-7304-cc365604dcef:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 2542e07e-2127-a42f-7304-cc365604dcef:0:0: State change requested RUNNING --> FAILED
2018-03-30 01:16:50,062 [2542e07e-2127-a42f-7304-cc365604dcef:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 2542e07e-2127-a42f-7304-cc365604dcef:0:0: State change requested FAILED --> FINISHED
2018-03-30 01:16:50,065 [2542e07e-2127-a42f-7304-cc365604dcef:frag:0:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: The current reader doesn't support getting next information.

Fragment 0:0

[Error Id: 3c133a00-1e01-4a58-8472-9af37033b518 on LAPTOP-VESBNEG3:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IllegalStateException: The current reader doesn't support getting next information.

Fragment 0:0

[Error Id: 3c133a00-1e01-4a58-8472-9af37033b518 on LAPTOP-VESBNEG3:31010]
    at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586) ~[drill-common-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:298) [drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) [drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267) [drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.12.0.jar:1.12.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_161]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_161]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
Caused by: java.lang.IllegalStateException: The current reader doesn't support getting next information.
    at org.apache.drill.exec.vector.complex.impl.AbstractBaseReader.next(AbstractBaseReader.java:64) ~[vector-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.vector.complex.impl.SingleMapReaderImpl.next(SingleMapReaderImpl.java:34) ~[vector-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.test.generated.ProjectorGen319.doEval(ProjectorTemplate.java:35) ~[na:na]
    at org.apache.drill.exec.test.generated.ProjectorGen319.projectRecords(ProjectorTemplate.java:67) ~[na:na]
    at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:198) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:97) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext(LimitRecordBatch.java:115) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:79) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227) ~[drill-java-exec-1.12.0.jar:1.12.0]
    at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0_161]
    at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_161]
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) ~[hadoop-common-2.7.1.jar:na]
    at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227) [drill-java-exec-1.12.0.jar:1.12.0]
    ... 4 common frames omitted
2018-03-30 01:16:50,068 [BitServer-1] INFO  o.a.d.e.w.fragment.FragmentExecutor - 2542e07e-2127-a42f-7304-cc365604dcef:0:0: State change requested FAILED --> CANCELLATION_REQUESTED
2018-03-30 01:16:50,068 [BitServer-1] WARN  o.a.d.e.w.fragment.FragmentExecutor - 2542e07e-2127-a42f-7304-cc365604dcef:0:0: Ignoring unexpected state transition FAILED --> CANCELLATION_REQUESTED
2018-03-30 01:16:50,173 [2542e07e-2127-a42f-7304-cc365604dcef:frag:0:0] WARN  o.a.drill.exec.work.foreman.Foreman - Dropping request to move to COMPLETED state as query is already at FAILED state (which is terminal).
****
hermandr commented 6 years ago
  1. Change the sql query to:

    yelp_biz_hours_tbl <- tbl(db, "(SELECT b.name AS name FROM dfs.data.`/yelp/business.json` b)")

    Works Completes with no errors.

  2. Change the sql query to:

    yelp_biz_hours_tbl <- tbl(db, "(SELECT b.name AS name, b.hours FROM dfs.data.`/yelp/business.json` b)")

    Works Completes with no errors.

hermandr commented 6 years ago

Hi Bob,

The error is from drill SQL engine. IS NULL and IS NOT NULL not supported. bug report #5382

Herman

hermandr commented 6 years ago

The error in the SQL statement is due to the fact that the business.json format has changed.

No issues.