Closed kbvernon closed 3 weeks ago
Hi,
No, I have not considered using Arrow. To be honest, I don't know exactly what it is but I will look at it. Internally, vertex IDs are integers from 0 to nbnode-1. Character type is more flexible for the user because anything can be converted into character.
Hi, thanks!
I'm not sure I fully understand the Arrow system either, but playing around with it, I can say it is VERY fast, like data.table fast, and it makes a table VERY small in memory, which is really useful when you're working with a graph that has N > 1 M vertices and 8 * N edges.
library(arrow)
library(cppRouting)
library(lobstr)
roads_dataframe <- read.csv(
"https://raw.githubusercontent.com/vlarmet/cppRouting/master/data_readme/roads.csv",
colClasses = c("character", "character", "numeric")
)
roads_arrow <- arrow::arrow_table(roads_dataframe)
lobstr::obj_size(roads_dataframe)
# 29.59 MB
lobstr::obj_size(roads_arrow)
# 284.64 kB
Hi, didn't want to barge in on #12 , so thought I would raise this as a separate issue. Have you considered using an Arrow table to hold your graph object rather than a base R data.frame? In addition to the better memory management, there are other advantages to going the Arrow route, like storing metadata. If this is something you think is worthwhile, I would be happy to help implement at least the R side of it - afraid the C++ side of things is a little bit beyond me at the moment.
I'm coming at this from ecology, where the conversion from a grid to a graph can be prohibitively costly in terms of memory.
Also, a somewhat related question: why did you choose to represent the numeric vertex IDs as character strings? Is there some advantage to this in your C++ code?
This is a great package, btw. Thanks!