Closed j450h1 closed 4 years ago
I've found the block of code to change the data types it is in the node_measures
function.
if (!m %in% names(self$nodes)) {
log_info(sprintf("Calculating %s...", m))
result <- private$node_measure_functions[[m]](self)
#result <- as.character(result) #didn't work for some reason
resultDT <- data.table::data.table(
node_name = names(result)
, result = as.character(result) #change here
)
setkeyv(resultDT, 'node_name')
self$nodes[, eval(m) := resultDT[node, result]]
}
A new parameter will need to be added to the node_measures
feature to specify the type since I couldn't figure out a way to adjust the type outside of this function. I see two different approaches here, the new parameter could have a default type of character and then only the double columns (betweenness and pageRank) would not have this character conversion.
The 2nd approach is to have no default type and it would be a required argument to specify the type. If the type was integer, then a conversion would be done otherwise everything stays the same.
Let me know what approach is preferred and if you have any questions before I proceed.
Here is an example of where the node_measures
function is called:
graphOutDegree = function(self){
measure <- 'outDegree'
igraph::centralize(
scores = self$node_measures(measure)[, get(measure)]
, theoretical.max = igraph::centr_degree_tmax(
graph = self$igraph
, mode = "out"
, loops = TRUE
)
, normalized = TRUE
)
}
With the updated function I think this line would look something like:
scores = self$node_measures(measure, "integer")[, get(measure)]
Another approach for the new argument is to focus on integers only and have a boolean which defaults to FALSE and would need to adjusted to TRUE for the 4 integer columns.
Merging #249 into master will increase coverage by
0.01%
. The diff coverage is100%
.
@@ Coverage Diff @@
## master #249 +/- ##
==========================================
+ Coverage 92.44% 92.46% +0.01%
==========================================
Files 12 12
Lines 927 929 +2
==========================================
+ Hits 857 859 +2
Misses 70 70
Impacted Files | Coverage Δ | |
---|---|---|
R/GraphClasses.R | 93.15% <ø> (ø) |
:arrow_up: |
R/AbstractGraphReporter.R | 90% <100%> (ø) |
:arrow_up: |
R/FunctionReporter.R | 93.25% <0%> (+0.02%) |
:arrow_up: |
R/testing_utils.R | 82.85% <0%> (+0.5%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update f6de57b...d74dcb8. Read the comment docs.
@jameslamb Let me know if you have any preference before I proceed with the updated way to change the node_measures
function
@j450h1 , James will be AFK for a while, so I apologize for some feedback that may be redundant (there's a lot you've discussed here).
The issue is with how the DT in the report is handling the rounding. At first thought, I would not want to affect the datatypes in the data that is available after a CreatePackageReport()
run. I think we were on the same page with that since I see the first attempts were to modify the code HERE, between the node
data.table creation and the report formatting.
HOWEVER, as you point out @j450h1, the root issue is that the output of outDegree
,inDegree
,numRecursiveDeps
,numRecursiveRevDeps
are the wrong data type. They should be integers but are double. I think this is masked a little b/c of how R console handles this.
Example:
library(pkgnet)
t <- CreatePackageReport('ggplot2')
str(t$DependencyReporter$nodes)
Which looks fine in the terminal:
But DT
appropriately rounds these doubles (b/c they are doubles):
SUGGESTION Just cast the function output to integer with the function.
Example (not tested and tabs a wreak):
outDegree = function(self){
as.integer(
igraph::degree(
graph = self$igraph
, mode = "out"
, loops = TRUE
)
)
}
Theoretical Output with one column type changed:
@jameslamb I believe I saw your concern above about changing functionality, but I think that risk is minimal. Worth seeing what unit tests come back with.
So @j450h1, it's not as elaborate of a suggestion, but what do you think about casting the output of those functions to integer?
I agree with @bburns632 suggestion of casting to int within the functions where those specific four measures are calculated.
Okay great. I will work on update later today.
Adjusting as such:
outDegree = function(self){
result <- igraph::degree(
graph = self$igraph
, mode = "out"
, loops = TRUE
)
as.integer(result)
}
leads to this error:
## Error in setkeyv(resultDT, "node_name"): some columns are not in the data.table: node_name
Looks like this is related to the fact that names(result)
is now NULL and is assigned to node_name as per below:
# If not already calculated it, calculate and add to node DT
if (!m %in% names(self$nodes)) {
log_info(sprintf("Calculating %s...", m))
result <- private$node_measure_functions[[m]](self)
resultDT <- data.table::data.table(
node_name = names(result)
, result = result
)
setkeyv(resultDT, 'node_name')
self$nodes[, eval(m) := resultDT[node, result]]
}
I'm not very familiar with the Datatable package, so would appreciate any thoughts on why just changing the type to integer would affect the names?
@j450h1 very sorry for the long delay in helping you figure this out. I think we're close, so I hope you still have time to take a look at this.
The igraph::degree
function returns named vectors. Later, where we put it into a data.table
, it grabs the names from the vector to assign to the node name column.
Apparently, as we are both learning, the as.integer
function strips names from vectors. Quick example:
myvec <- c(foo=1.0, bar=2.0)
myvec
#> foo bar
#> 1 2
names(myvec)
#> [1] "foo" "bar"
is.integer(myvec)
#> [1] FALSE
myvec <- as.integer(myvec)
names(myvec)
#> NULL
is.integer(myvec)
#> [1] TRUE
Based on a quick look at this StackOverflow question, I think the best option is to instead use mode
to make those vectors into integers.
myvec <- c(foo=1.0, bar=2.0)
myvec
#> foo bar
#> 1 2
mode(myvec) <- "integer"
myvec
#> foo bar
#> 1 2
names(myvec)
#> [1] "foo" "bar"
is.integer(myvec)
#> [1] TRUE
Created on 2020-01-12 by the reprex package (v0.3.0)
Give that a try, and see if it works. Thanks so much for your patience and for taking a shot at this issue!
Sure @jayqi I will give it a crack, thanks for pointing me in the right direction!
Looks like that did the trick! See latest commit:
With the assumption that these two which I don't see in the output (Out-Closeness and In-Closeness) are not integers and should not be converted like the 4 as shown in above screenshot:
# Out-Closeness
# Closeness doesn't really work for directed graphs that are not
# strongly connected.
# igraph calculates a thing anyways and gives a warning
# Typically given as normalized values
, outCloseness = function(self){
suppressWarnings(igraph::closeness(
graph = self$igraph
, mode = "out"
, normalized = TRUE
))
}
# In-Closeness
# Closeness doesn't really work for directed graphs that are not
# strongly connected.
# igraph calculates a thing anyways and gives a warning
# Typically given as normalized values
, inCloseness = function(self){
suppressWarnings(igraph::closeness(
graph = self$igraph
, mode = "out"
, normalized = TRUE
))
}
Likewise I appreciate your help and patience as I worked through these final updates.
Sent from my iPhone
On Jan 16, 2020, at 6:33 PM, Jay Qi notifications@github.com wrote:
Merged #249 into master.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
… one non-Integer values. Right now those columns are betweenness and pageRank. Resolves issue #228