natverse / neuprintr

R client utilities for interacting with the neuPrint connectome analysis service
http://natverse.org/neuprintr
3 stars 3 forks source link

Add neuron name/type to neuprint_connection_table #132

Closed jefferis closed 3 years ago

jefferis commented 3 years ago

I guess the alternative is to merge in all metadata. I thought this was a good compromise for many purposes. I will likely add another function that adds or updates metadata for an existing data.frame. Something like this:

neuprint_add_meta <- function(x, idname='bodyid', ignore.case = TRUE, ...) {
  if(!is.data.frame(x)) stop("I expect a data frame")
  cx=colnames(x)
  if(isTRUE(ignore.case)) {
    cx=tolower(cx)
    idname=tolower(idname)
  }
  matchcol=stats::na.omit(colnames(x)[match(idname, cx)])
  if(length(matchcol)!=1)
    stop("id column:", idname, " not present exactly once in input data frame!")

  # nb only check unique ids
  meta=neuprint_get_meta(unique(x[[matchcol]]), ...)
  # just merge the body id column
  merged=merge(x[matchcol], meta, by.x=matchcol, by.y='bodyid', all.x = T, sort=F)
  # make sure we have same number of rows in both tables
  stopifnot(isTRUE(all.equal(nrow(x),nrow(merged))))
  # make sure that the id orders match exactly
  merged=merged[match(x[[matchcol]], merged[[1]]), ]
  # and then check that ids are identical
  stopifnot(isTRUE(all.equal(x[[matchcol]], merged[[1]])))
  # now set columns that are present in meta (overwriting dups)
  x[colnames(merged)]=merged
  x
}

You would then use it like this:

mbon01ds=neuprint_connection_table("MBON01", threshold=5)
mbon01ds=neuprint_add_meta(mbon01ds, idname="partner")
# do your analysis
romainFr commented 3 years ago

Yes, that's basically what our workflows looks like right now. So pulling it right when pulling the connections would save the overhead of finding them in the database twice. But I suppose it is a matter what the most common workflows are?

On a related topic, we usually reformat our connection tables into a to/from (name.from/name.to, type.fom/type.to...) format to not be dependent on the "prepost" column. Would such a reformatting function be of interest for neuprintr?

jefferis commented 3 years ago

Do you want to sketch out your format?

romainFr commented 3 years ago

Yes, starting from a connection table with added metadata for both the partners and the "source" neurons, I do something like :

   connectionTable <- connectionTable %>% mutate(from = ifelse(prepost==1,bodyid,partner),
                                                  to = ifelse(prepost==1,partner,bodyid),
                                                  name.from = as.character(ifelse(prepost==1,name,partnerName)),
                                                  name.to = as.character(ifelse(prepost==1,partnerName,name)),
                                                  type.from = as.character(ifelse(prepost==1,type,partnerType)),
                                                  type.to = as.character(ifelse(prepost==1,partnerType,type))
    ) %>%
      select(-bodyid,-partner,-name,-partnerName,-partnerType,-type,-prepost)
    return(connectionTable)

I'm thinking that to put the connections in context it would then make sense to add to that downstream.from(or post.from) and upstream.to (or pre.to) and their ROI specific equivalents if the request is ROI specific.

The other potential fields (status.from and status.to, notes.from and to) may also come in handy in some analysis/brain regions.

I'd be happy to make a PR for that if that's useful.

jefferis commented 3 years ago

@romainFr I'm merging this, but I'd be very happy to see a PR along the lines that you suggest so long as it stays as lean as possible.