Open schlegelp opened 7 years ago
This is affecting FAFB vs FC uPNs mean score calculation - as mentioned in previous comment, some neurons with high mean score (but not forward) are being lost from the summary.
library(elmr)
library(flycircuit)
#Collect FAFB dps
fafbdps=fetchdps_fafb('annotation:WTPN2017_olfactory_uPN_right')
#Collect uPNs from FlyCircuit
upns=fc_gene_name(subset(annotation,annotation_class=='NeuronSubType' & grepl('uPN',text))$neuron_idid)
#Keep only the ones in good_images(good registration)
good_images=scan(fc_download_data("http://jefferislab.org/si/nblast/flycircuit/good_images.txt"),what='', quiet = TRUE)
upns=intersect(upns, good_images)
upns=setdiff(upns, c("DvGlutMARCM-F1364_seg1"))
devtools::source_gist("bbaf5d53353b3944c090", filename = "FlyCircuitStartupNat.R")
fcupns=dps[upns]
#calculate forward score
fafbsc=nblast(query = fafbdps, target = fcupns, normalised = TRUE)
#calculate reverse score
fafbscr=nblast(query = fcupns, target = fafbdps, normalised = TRUE)
#calculate mean score: (forward + reverse)/2
fafbscmu=(fafbsc + t(fafbscr))/2
fafbscmu_sort=sapply(names(fafbdps), function(x) sort(fafbscmu[,x], dec=T))
#For skid 16, comparing
head(fafbscmu_sort)[,1]
#to
sc16=nblast_fafb(16)
summary(sc16)
Mean score top hit in summary is DvGlutMARCM-F004348_seg001 (0.54579523) but the real top hit, from fafbscmu_sort, is FruMARCM-F000734_seg001 (0.5605835)
Are they there if you increase n>10?
Sent from my iPhone
On 1 Dec 2017, at 16:12, Marta Costa notifications@github.com wrote:
This is affecting FAFB vs FC uPNs mean score calculation - as mentioned in previous, some neurons with high mean score (but not forward) are being lost from the summary
library(elmr) library(flycircuit)
Collect FAFB dps
fafbdps=fetchdps_fafb('annotation:WTPN2017_olfactory_uPN_right')
Collect uPNs from FlyCircuit
upns=fc_gene_name(subset(annotation,annotation_class=='NeuronSubType' & grepl('uPN',text))$neuron_idid)
Keep only the ones in good_images(good registration)
good_images=scan(fc_download_data("http://jefferislab.org/si/nblast/flycircuit/good_images.txt"),what='', quiet = TRUE) upns=intersect(upns, good_images) upns=setdiff(upns, c("DvGlutMARCM-F1364_seg1")) devtools::source_gist("bbaf5d53353b3944c090", filename = "FlyCircuitStartupNat.R") fcupns=dps[upns]
calculate forward score
fafbsc=nblast(query = fafbdps, target = fcupns, normalised = TRUE)
calculate reverse score
fafbscr=nblast(query = fcupns, target = fafbdps, normalised = TRUE)
calculate mean score: (forward + reverse)/2
fafbscmu=(fafbsc + t(fafbscr))/2 fafbscmu_sort=sapply(names(fafbdps), function(x) sort(fafbscmu[,x], dec=T))
For skid 16, comparing
head(fafbscmu_sort)[,1]
to
sc16=nblast_fafb(16) summary(sc16) Mean score top hit in summary is DvGlutMARCM-F004348_seg001 (0.54579523) but the real top hit, from fafbscmu_sort, is FruMARCM-F000734_seg001 (0.5605835)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Yes, at n=17
Forward score for is FruMARCM-F000734_seg001 0.5272133
summary.nblastfafb
subsets for the top N hits and only then calculates and reorders by mu_score. This leads to two potential problems: (a) If there are nblast hits with high mu_score that are not among the top N forward scores, they will not show up in the summary. (b) Probably less of a problem: if the original order (which is based on forward score) is changed by the user, the summary will be different. This could of course also be intended.