`summary.nblastfafb` subsets results before ordering

schlegelp commented 7 years ago

summary.nblastfafb subsets for the top N hits and only then calculates and reorders by mu_score. This leads to two potential problems: (a) If there are nblast hits with high mu_score that are not among the top N forward scores, they will not show up in the summary. (b) Probably less of a problem: if the original order (which is based on forward score) is changed by the user, the summary will be different. This could of course also be intended.

mmc46 commented 7 years ago

This is affecting FAFB vs FC uPNs mean score calculation - as mentioned in previous comment, some neurons with high mean score (but not forward) are being lost from the summary.

library(elmr)
library(flycircuit)
#Collect FAFB dps
fafbdps=fetchdps_fafb('annotation:WTPN2017_olfactory_uPN_right')
#Collect uPNs from FlyCircuit
upns=fc_gene_name(subset(annotation,annotation_class=='NeuronSubType' & grepl('uPN',text))$neuron_idid)
#Keep only the ones in good_images(good registration)
good_images=scan(fc_download_data("http://jefferislab.org/si/nblast/flycircuit/good_images.txt"),what='', quiet = TRUE)
upns=intersect(upns, good_images)
upns=setdiff(upns, c("DvGlutMARCM-F1364_seg1"))
devtools::source_gist("bbaf5d53353b3944c090", filename = "FlyCircuitStartupNat.R")
fcupns=dps[upns]
#calculate forward score
fafbsc=nblast(query = fafbdps, target = fcupns, normalised = TRUE)
#calculate reverse score
fafbscr=nblast(query = fcupns, target = fafbdps, normalised = TRUE)
#calculate mean score: (forward + reverse)/2
fafbscmu=(fafbsc + t(fafbscr))/2
fafbscmu_sort=sapply(names(fafbdps), function(x) sort(fafbscmu[,x], dec=T))

#For skid 16, comparing
head(fafbscmu_sort)[,1]

#to
sc16=nblast_fafb(16)
summary(sc16)

Mean score top hit in summary is DvGlutMARCM-F004348_seg001 (0.54579523) but the real top hit, from fafbscmu_sort, is FruMARCM-F000734_seg001 (0.5605835)

jefferis commented 7 years ago

Are they there if you increase n>10?

Sent from my iPhone

On 1 Dec 2017, at 16:12, Marta Costa notifications@github.com wrote:

This is affecting FAFB vs FC uPNs mean score calculation - as mentioned in previous, some neurons with high mean score (but not forward) are being lost from the summary

library(elmr) library(flycircuit)

Collect FAFB dps

fafbdps=fetchdps_fafb('annotation:WTPN2017_olfactory_uPN_right')

Collect uPNs from FlyCircuit

upns=fc_gene_name(subset(annotation,annotation_class=='NeuronSubType' & grepl('uPN',text))$neuron_idid)

Keep only the ones in good_images(good registration)

good_images=scan(fc_download_data("http://jefferislab.org/si/nblast/flycircuit/good_images.txt"),what='', quiet = TRUE) upns=intersect(upns, good_images) upns=setdiff(upns, c("DvGlutMARCM-F1364_seg1")) devtools::source_gist("bbaf5d53353b3944c090", filename = "FlyCircuitStartupNat.R") fcupns=dps[upns]

calculate forward score

fafbsc=nblast(query = fafbdps, target = fcupns, normalised = TRUE)

calculate reverse score

fafbscr=nblast(query = fcupns, target = fafbdps, normalised = TRUE)

calculate mean score: (forward + reverse)/2

fafbscmu=(fafbsc + t(fafbscr))/2 fafbscmu_sort=sapply(names(fafbdps), function(x) sort(fafbscmu[,x], dec=T))

For skid 16, comparing

head(fafbscmu_sort)[,1]

to

sc16=nblast_fafb(16) summary(sc16) Mean score top hit in summary is DvGlutMARCM-F004348_seg001 (0.54579523) but the real top hit, from fafbscmu_sort, is FruMARCM-F000734_seg001 (0.5605835)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

mmc46 commented 7 years ago

Yes, at n=17

Forward score for is FruMARCM-F000734_seg001 0.5272133

natverse / elmr

`summary.nblastfafb` subsets results before ordering #30

Collect FAFB dps

Collect uPNs from FlyCircuit

Keep only the ones in good_images(good registration)

calculate forward score

calculate reverse score

calculate mean score: (forward + reverse)/2

For skid 16, comparing

to