natverse / neuprintr

R client utilities for interacting with the neuPrint connectome analysis service
http://natverse.org/neuprintr
3 stars 3 forks source link

return bodyids as characters, not factors? #27

Closed romainFr closed 4 years ago

romainFr commented 4 years ago

Not sure when that changed, but I find that having the bodyids as factors (for example in the results of neuprint_find_neurons) is generally a bad idea as it cannot be passed to other functions directly : as.numeric will transform them to the factor index, not the actual numeric bodyid. The only required change would be using stringAsFactor = FALSE when creating the data.frames

jefferis commented 4 years ago

Where is this happening?

romainFr commented 4 years ago

As far as I could tell only in neuprint_find_neurons, so I just pushed b765c96426cc294f7fe9a58db9ad418a40155426 to fix it.

jefferis commented 4 years ago

The only really safe base data type for a body id is actually character vector. Numeric vectors always have the potential to lose precision during conversion to character (which happens when you make the json representation). In particular they have ~ 53 bits of precision compared with the full 64 bit range of an integer id. I've been kind of hoping to avoid running into this as it will be a pain to deal with.

romainFr commented 4 years ago

Agreed. There might be a functions returning numeric vectors, so we should take a look at it.

I think we do need to convert them to numeric when passing them to the cyphers through jsonlite::toJSON(as.numeric(unlist(bodyids))) though -- the alternative would be to replace "toJSON" by a custom built string

jefferis commented 4 years ago

The biggest integer that we need to worry about is 2^64-1 = 9223372036854775806. This cannot be represented as a numeric.

# no good because we need a json integer
> jsonlite::toJSON("9223372036854775806")
["9223372036854775806"] 
# no good because loses precision (even if you persuade it not to print in scientific form)
> jsonlite::toJSON(9223372036854775806)
[9.22337203685478e+18] 

The best way is to use 64 bit integers via the bit64 package. You basically need to do this:

# character input
> jsonlite::toJSON(bit64::as.integer64("9223372036854775806"))
[9223372036854775806] 
# actual bit64 input
> jsonlite::toJSON(as.integer64(2)^64-1)
[9223372036854775806] 

So we could have an internal function that does the following

  1. converts factors to character
  2. converts character vectors to bit64
  3. passes on bit64 objects
  4. converts numeric objects to bit64 after checking for loss of precision (53 bits)
  5. converts R integer objects to bit64 after checking for loss of precision (32 bits)

Then takes the result and jsonifies it.

jefferis commented 4 years ago

@romainFr Just a note that I'll take this.

jefferis commented 4 years ago

This discussion is now closed by #30.