get_unique_name() in node_class.R might not be unique #366

Open njtierney opened 3 years ago

njtierney commented 3 years ago

Perhaps this is not likely to happen, or for this to be an issue, but it seems that the rhex() function as defined isn't gauranteed to create a unique name if there are many many nodes (like 1 million).

This is used in node_class.R.

See example below.

n_rhex <- 1e6

# generate a random 8-digit hexadecimal string
rhex <- function() paste(as.raw(, 4, TRUE) - 1L), collapse = "")

many_rhex <- replicate(n = n_rhex, expr = rhex(), simplify = "vector")

#> [1] 999874

dplyr::n_distinct(many_rhex) == n_rhex
#> [1] FALSE

Created on 2021-04-08 by the reprex package (v2.0.0)

Perhaps digest or something like could be used to give nodes unique IDs

njtierney commented 3 years ago

Or or do something like what reprex did (, btu this might not be unique enough.

njtierney commented 3 years ago

this is currently being worked on here

njtierney commented 3 years ago
njtierney commented 2 years ago

There is an issue where an error appears:

Error in distrib_constructor(tf_parameter_list, dag = self) : could not find function "distrib_constructor"

Which means it is not finding

It makes me wonder if perhaps this is related to this issue. We have not been able to reliably develop a small reprex for this issue, so it might not be related to this one.

njtierney commented 4 months ago

A note on using hashing like secretbase, which is what targets uses internall. So as long as the nodes aren't identical, this will work, but if two nodes/R6 objects are identical, they will be identical. So I guess the idea is as long as the input isn't identical, it should be OK.

n_rhex <- 1e6

# generate a random 8-digit hexadecimal string
rhex <- function() paste(as.raw(, 4, TRUE) - 1L), collapse = "")

many_rhex <- function(x) replicate(n = x, expr = rhex(), simplify = "vector")

rhexes <- many_rhex(n_rhex)

#> [1] 999883

dplyr::n_distinct(rhexes) == n_rhex
#> [1] FALSE

many_siphash <- function(n) {
  X = seq_len(n), 
  FUN = secretbase::siphash13,
  FUN.VALUE = ""

many_siphashes <- many_siphash(n_rhex)

#> [1] 1000000

dplyr::n_distinct(many_siphashes) == n_rhex
#> [1] TRUE

Created on 2024-05-28 with reprex v2.1.0

njtierney commented 4 months ago

Other alternatives: {digest} ?

njtierney commented 2 months ago

Some ideas on debugging this.

greta_stash$object_counter <- 0L

# generate a unique name for each node.
rhex <- function() {
  count <- greta_stash$object_counter + 1L
  greta_stash$object_counter <- count
  # paste(as.raw(, 4, TRUE) - 1L), collapse = "")

So we get a sense of how many objects are created?