neo4j-rstats / neo4r

A Modern and Flexible Neo4J Driver
https://neo4j-rstats.github.io/user-guide/
Other
106 stars 30 forks source link

Error: Column `V4` can't be converted from numeric to character #68

Open paul-shannon opened 4 years ago

paul-shannon commented 4 years ago

Many thanks for this package - it is very useful. Here I report an apparent bug which I hope you can help with.

I am working my way through the 2019 edition of "Graph Algorithms: Practical Examples in Apache Spark and Neo4j" by Needham and Hodler from O'Reilly. The example on page 59 is of Yen's shortest path algorithm, which I call like this:

s <- paste("MATCH (start:Place {id:'Gouda'}), (end:Place {id:'Felixstowe'})",
           "CALL algo.kShortestPaths.stream(start, end, 5, 'distance')",
           "YIELD index, nodeIds, path, costs",
           "RETURN index,",
           "[node in algo.getNodesById(nodeIds[1..-1]) | node.id] AS via,",
           "reduce(acc=0.0, cost in costs | acc + cost) AS totalCost")
call_neo4j(s, con)

with this result:

Error: Column `V4` can't be converted from numeric to character

this traceback:

> traceback()
14: stop(list(message = "Column `V4` can't be converted from numeric to character", 
        call = NULL, cppstack = NULL))
13: bind_rows_(x, .id)
12: dplyr::bind_rows(res, .id = .id)
11: purrr::map_dfr(., purrr::flatten_dfc)
10: function_list[[i]](value)
9: freduce(value, `_function_list`)
8: `_fseq`(`_lhs`)
7: eval(quote(`_fseq`(`_lhs`)), env, env)
6: eval(quote(`_fseq`(`_lhs`)), env, env)
5: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
4: flatten(res_data) %>% purrr::map_dfr(purrr::flatten_dfc) %>% 
       list()
3: parse_row(res_data, res_names, include_stats, stats, meta, res_meta)
2: parse_api_results(res = res, type = type, format = format, include_stats = include_stats, 
       meta = include_meta)
1: call_neo4j(s, con)

and this sessionInfo()

R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RCyjs_2.9.1         graph_1.64.0        BiocGenerics_0.32.0 BrowserViz_2.9.1    httpuv_1.5.2        jsonlite_1.6.1      RUnit_0.4.32        neo4r_0.1.3        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3        pillar_1.4.3      compiler_3.6.2    later_1.0.0       tools_3.6.2       base64enc_0.1-3   digest_0.6.25     tibble_2.1.3      lifecycle_0.2.0   pkgconfig_2.0.3   rlang_0.4.5       shiny_1.4.0       rstudioapi_0.11   curl_4.3          fastmap_1.0.1     httr_1.4.1        dplyr_0.8.5       vctrs_0.2.3       stats4_3.6.2      attempt_0.3.0     tidyselect_1.0.0  glue_1.3.1        data.table_1.12.8 R6_2.4.1          purrr_0.3.3       tidyr_1.0.2       magrittr_1.5      promises_1.1.0    htmltools_0.4.0   assertthat_0.2.1  mime_0.9          xtable_1.8-4      crayon_1.3.4     
> wdth(80)
> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] RCyjs_2.9.1         graph_1.64.0        BiocGenerics_0.32.0
[4] BrowserViz_2.9.1    httpuv_1.5.2        jsonlite_1.6.1     
[7] RUnit_0.4.32        neo4r_0.1.3        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3        pillar_1.4.3      compiler_3.6.2    later_1.0.0      
 [5] tools_3.6.2       base64enc_0.1-3   digest_0.6.25     tibble_2.1.3     
 [9] lifecycle_0.2.0   pkgconfig_2.0.3   rlang_0.4.5       shiny_1.4.0      
[13] rstudioapi_0.11   curl_4.3          fastmap_1.0.1     httr_1.4.1       
[17] dplyr_0.8.5       vctrs_0.2.3       stats4_3.6.2      attempt_0.3.0    
[21] tidyselect_1.0.0  glue_1.3.1        data.table_1.12.8 R6_2.4.1         
[25] purrr_0.3.3       tidyr_1.0.2       magrittr_1.5      promises_1.1.0   
[29] htmltools_0.4.0   assertthat_0.2.1  mime_0.9          xtable_1.8-4     
[33] crayon_1.3.4     
paul-shannon commented 4 years ago

@ColinFay sorry to pester but may I inquire - without being an ungrateful wretch :} - does anyone have the time to fix bugs and otherwise work on your good package?

fBedecarrats commented 4 years ago

[Hold on, I just found the data. I'll get back to you in a moment] Hi Paul, I'm not the developer nor maintainer of this app, but I'd tried to take a look at your issue. It's difficult to do because the elements you provide do not qualify as a reproducible example, as you don't provide the Neo4J data you run your query against. I think that the bug comes from the function parse_row( ), which is used by call_neo4j( ) to parse the results Neo4J sends back when those are arrays. Specifically, it seems the first parsed row(s?) contain numeric values in column v4 and a subsequent row has a character variable in column V4, which is incompatible type with the previous values. I'm afraid that the only way to solve your issue would be to understand what actually comes out of Neo4J. Could you please try to run the cypher query directly in a Neo4J client, and copy the output here. Otherwise, I would need to get the data you run the query against to reproduce the output.

fBedecarrats commented 4 years ago

Hi again. Here is a reproducible example:

# Install dev verion from github
install.packages("remotes")
remotes::install_github("neo4j-rstats/neo4r")
library(neo4r)
library(tibble)

# Connect to the Neo4J DB running on Neo4J desktop
con <- neo4j_api$new(
  url = "http://localhost:7474",
  user = "neo4j", 
  password = "password"
) 

load_csv(url = "https://raw.githubusercontent.com/neo4j-graph-analytics/book/master/data/transport-nodes.csv",
         con = con, header = TRUE, periodic_commit = 50, 
         as = "row", on_load = 'MERGE (place:Place {id:row.id})
         SET place.latitude = toFloat(row.latitude),
         place.longitude = toFloat(row.latitude),
         place.population = toInteger(row.population)')

load_csv(url = "https://raw.githubusercontent.com/neo4j-graph-analytics/book/master/data/transport-relationships.csv",
         con = con, header = TRUE, periodic_commit = 50, 
         as = "row", on_load = 'MATCH (origin:Place {id: row.src})
         MATCH (destination:Place {id: row.dst})
         MERGE (origin)-[:EROAD {distance: toInteger(row.cost)}]->(destination)')

query <- 'MATCH (start:Place {id:"Gouda"}), 
        (end:Place {id:"Felixstowe"}) 
        CALL algo.kShortestPaths.stream(start, end, 5, "distance") 
        YIELD index, nodeIds, path, costs RETURN index, [node in algo.getNodesById(nodeIds[1..-1]) | node.id] 
        AS via, reduce(acc=0.0, cost in costs | acc + cost) AS totalCost'

res <- call_neo4j(query, con = con, type = "row")

This produces the error code mentioned by @paul-shannon :
Error: Column 'V4' can't be converted from numeric to character

However, the following returns a correct JSON:

res <- call_neo4j(query, con = con, output = "json")

In Neo4J, the results looks like:

index via totalCost
0 ["Rotterdam", "Hoek van Holland"] 265.0
1 ["Den Haag", "Hoek van Holland"] 266.0
2 ["Rotterdam", "Den Haag", "Hoek van Holland"] 285.0
3 ["Den Haag", "Rotterdam", "Hoek van Holland"] 298.0
4 ["Utrecht", "Amsterdam", "Den Haag", "Hoek van Holland"] 374.0

The problem stems from the fact that, due to the transformations included at the end of the cypher query, the second column (named "via") contains nested values. This case is not handled by the parsing api programmed by @ColinFay and cannot automatically be processed as a tibble. I really don't know if it would even be possible to rewrite the parse_raw( ) function to handle such cases, as the Neo4J output corresponds in essence to a list object in R.

paul-shannon commented 4 years ago

@fBedecarrats @ColinFay Many thanks, Florent, for creating the reproducible example. I agree: a perfect transformation from all Neo4j results into R tibbles might not be attainable.

Colin mentions issue 66 adding a method to the neo4r class which forgoes that step, instead just returning a list. Maybe that would help out in cases such as this one.