neo4j-rstats / neo4r

A Modern and Flexible Neo4J Driver
https://neo4j-rstats.github.io/user-guide/
Other
106 stars 30 forks source link

FR add option call_neo4j(output = "raw") #85

Closed marciz closed 2 years ago

marciz commented 3 years ago

Please, add option to return just POST result from call_neo4j without convertion to json. For large queries users might want to parse it with their own custom solution.

ColinFay commented 3 years ago

Hey,

Right now you can do that with call_api(output = "json"), which will give you a JSON string of the result, so you can parse it the way you want.

I'm not sure the POST result itself would be worth it?

Let me know,

Colin

marciz commented 3 years ago

Hi, Colin!

The problem with json is the timing for large queries.

# copy of call_neo4j without parsing
call_neo4j2 <- function (query, con, type = c("row", "graph"), output = c("r", "json"), include_stats = FALSE, include_meta = FALSE) 
{
  stop_if_not(con, ~"Neo4JAPI" %in% class(.x), "Please use a Neo4JAPI object.")
  output <- match.arg(output)
  type <- match.arg(type)
  query_clean <- clean_query(query)
  query_jsonised <- to_json_neo(query_clean, include_stats, 
                                include_meta, type)
  body <- glue("{\"statements\" : [ %query_jsonised% ]}", 
               .open = "%", .close = "%")
  res <- POST(url = glue("{con$url}/db/data/transaction/commit?includeStats=true"), 
              add_headers(.headers = c(`Content-Type` = "application/json", 
                                       accept = "application/json", Authorization = paste0("Basic ", 
                                                                                           con$auth))), body = body)
  stop_if_not(status_code(res), ~.x == 200, "API error")

  res
  # if (output == "json") {
  #   toJSON(lapply(content(res)$results, function(x) x$data), 
  #          pretty = TRUE)
  # }
  # else {
  #   parse_api_results(res = res, type = type, format = format, 
  #                     include_stats = include_stats, meta = include_meta)
  # }
}
environment(call_neo4j2) <- environment(call_neo4j)

query_neo4j <- paste0(
    "
        MATCH (c:Claim)--()--(x:Claim)
        RETURN DISTINCT c.claimid, x.claimid
        LIMIT 50000
        ;
  "
) %>%
    call_neo4j2(con = con_neo)

When parsing results I need

system.time({
  x1 <- httr::content(query_neo4j)
  x2 <- x1$results[[1]]$data
  x3 <- x1$results[[1]]$columns
  x4 <- lapply(x2, function(x) setNames(x$row, x3)) %>% dplyr::bind_rows()
})

>    user  system elapsed 
>    1.39    0.02    1.42 

Using JSON

system.time({
  y1 <- jsonlite::toJSON(lapply(httr::content(query_neo4j)$results, function(x) x$data),
                         pretty = TRUE)
})

>   user  system elapsed 
>   10.66    0.14   10.80 

Or with output = "r"

system.time({
  z1 <- neo4r:::parse_api_results(res = query_neo4j, type = "row", format = format,
                      include_stats = FALSE, meta = FALSE)
})

>    user  system elapsed 
>   29.11    0.28   29.39 

R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale: [1] LC_COLLATE=Latvian_Latvia.1257 LC_CTYPE=Latvian_Latvia.1257
[3] LC_MONETARY=Latvian_Latvia.1257 LC_NUMERIC=C
[5] LC_TIME=Latvian_Latvia.1257

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] neo4r_0.1.1

loaded via a namespace (and not attached): [1] Rcpp_1.0.5 compiler_4.0.2 pillar_1.4.6 later_1.1.0.1
[5] tools_4.0.2 digest_0.6.25 jsonlite_1.7.1 lifecycle_0.2.0
[9] tibble_3.0.3 pkgconfig_2.0.3 rlang_0.4.10 shiny_1.5.0
[13] cli_2.2.0 rstudioapi_0.11 curl_4.3 xfun_0.17
[17] fastmap_1.0.1 httr_1.4.2 dplyr_1.0.2 generics_0.0.2
[21] vctrs_0.3.4 attempt_0.3.1 tidyselect_1.1.0 glue_1.4.2
[25] data.table_1.13.0 R6_2.4.1 fansi_0.4.1 purrr_0.3.4
[29] tidyr_1.1.2 magrittr_1.5 promises_1.1.1 ellipsis_0.3.1
[33] htmltools_0.5.0 assertthat_0.2.1 mime_0.9 xtable_1.8-4
[37] httpuv_1.5.4 tinytex_0.26 crayon_1.3.4

davidlrosenblum commented 3 years ago

Hi Mārcis,

Does the query time out? Can you describe the size of the result set returned? I think you need to pickup neo4r from GitHub - the cran version is old and retry.

Thank you,

David Rosenblum

On Jan 14, 2021, at 4:03 AM, Mārcis Bratka notifications@github.com wrote:

Hi, Colin!

The problem with json is the timing for large queries.

copy of call_neo4j without parsing

call_neo4j2 <- function (query, con, type = c("row", "graph"), output = c("r", "json"), include_stats = FALSE, include_meta = FALSE) { stop_if_not(con, ~"Neo4JAPI" %in% class(.x), "Please use a Neo4JAPI object.") output <- match.arg(output) type <- match.arg(type) query_clean <- clean_query(query) query_jsonised <- to_json_neo(query_clean, include_stats, include_meta, type) body <- glue("{\"statements\" : [ %query_jsonised% ]}", .open = "%", .close = "%") res <- POST(url = glue("{con$url}/db/data/transaction/commit?includeStats=true"), add_headers(.headers = c(Content-Type = "application/json", accept = "application/json", Authorization = paste0("Basic ", con$auth))), body = body) stop_if_not(status_code(res), ~.x == 200, "API error")

res

if (output == "json") {

toJSON(lapply(content(res)$results, function(x) x$data),

pretty = TRUE)

}

else {

parse_api_results(res = res, type = type, format = format,

include_stats = include_stats, meta = include_meta)

}

} environment(call_neo4j2) <- environment(call_neo4j)

query_neo4j <- paste0( " MATCH (c:Claim)--()--(x:Claim) RETURN DISTINCT c.claimid, x.claimid LIMIT 50000 ; " ) %>% call_neo4j2(con = con_neo) When parsing results I need

system.time({ x1 <- httr::content(query_neo4j) x2 <- x1$results[[1]]$data x3 <- x1$results[[1]]$columns x4 <- lapply(x2, function(x) setNames(x$row, x3)) %>% dplyr::bind_rows() })

user system elapsed 1.39 0.02 1.42 Using JSON

system.time({ y1 <- jsonlite::toJSON(lapply(httr::content(query_neo4j)$results, function(x) x$data), pretty = TRUE) })

user system elapsed 10.66 0.14 10.80 R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale: [1] LC_COLLATE=Latvian_Latvia.1257 LC_CTYPE=Latvian_Latvia.1257 [3] LC_MONETARY=Latvian_Latvia.1257 LC_NUMERIC=C [5] LC_TIME=Latvian_Latvia.1257

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] neo4r_0.1.1

loaded via a namespace (and not attached): [1] Rcpp_1.0.5 compiler_4.0.2 pillar_1.4.6 later_1.1.0.1 [5] tools_4.0.2 digest_0.6.25 jsonlite_1.7.1 lifecycle_0.2.0 [9] tibble_3.0.3 pkgconfig_2.0.3 rlang_0.4.10 shiny_1.5.0 [13] cli_2.2.0 rstudioapi_0.11 curl_4.3 xfun_0.17 [17] fastmap_1.0.1 httr_1.4.2 dplyr_1.0.2 generics_0.0.2 [21] vctrs_0.3.4 attempt_0.3.1 tidyselect_1.1.0 glue_1.4.2 [25] data.table_1.13.0 R6_2.4.1 fansi_0.4.1 purrr_0.3.4 [29] tidyr_1.1.2 magrittr_1.5 promises_1.1.1 ellipsis_0.3.1 [33] htmltools_0.5.0 assertthat_0.2.1 mime_0.9 xtable_1.8-4 [37] httpuv_1.5.4 tinytex_0.26 crayon_1.3.4

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/neo4j-rstats/neo4r/issues/85#issuecomment-760046645, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBXAGAZBUOOOVUK5LSZMNDSZ2XNJANCNFSM4WA44TSA.

marciz commented 3 years ago

Hi, @davidlrosenblum !

The problem is not neo4j query, it returns the expected results. The result is the list of length 50000 (as in query LIMIT 50000). However the processing time for parsing it is the issue. Therefore I would like to parse it with my custom function.

Tried GitHub version, but there were no difference with CRAN in my case. Which is probably as expected, because the slow part is the line toJSON(lapply(content(res)$results, function(x) x$data), pretty = TRUE).

Other solution would be to modify neo4r:::parse_api_results for faster processing. But I am not sure it is possible for all kind of queries/results.

M

ColinFay commented 3 years ago

Hi @marciz

I suppose that would make sense to return content(res) then (not sure it's worth returning the full httr result). Maybe with a param 'type = "raw"' ?

Would you be willing to make a PR to implement that ?

marciz commented 3 years ago

Yes, sure, can do PR.

Do you mean type = "raw"? Or output = "raw"?

ColinFay commented 3 years ago

output = "raw", yes, (so we have output = c("r", "json", "raw") )

thanks a lot!

ColinFay commented 2 years ago

closed via #86