natverse / nat.nblast

R package implementing the NBLAST neuron search algorithm, as an add-on for the NeuroAnatomy Toolbox (nat) R package.
http://natverse.org/nat.nblast/
17 stars 6 forks source link

nblast for swc files #39

Closed albertplus007 closed 4 years ago

albertplus007 commented 4 years ago

Hi, nice library

recently, i use the nblast to search the similar neurons which were produced by myself, and the neuron type was save as .swc, i read the neuron as the following code:

newskel1 = read.neuron.swc("/usr/skel_1.swc")
newskel2 = read.neuron.swc("/usr/skel_2.swc")

and i transform the neuron to neuronlist

newskel = neuronlist(skel1, skel2)

and then nblast

scores = nblast(skel1, newskel)

but it gives me the following:

more than 1 point in .CleanupParentArray, choosing first from: 2 11more than 1 point in .CleanupParentArray, choosing first from: 13 15more than 1 point in .CleanupParentArray, choosing first from: 191 196more than 1 point in .CleanupParentArray, choosing first from: 202 208Warning messages:
1: In .CleanupParentArray(d1[, "Parent"]) : 
2: In .CleanupParentArray(d1[, "Parent"]) : 
3: In .CleanupParentArray(d1[, "Parent"]) : 
4: In .CleanupParentArray(d1[, "Parent"]) : 

so, i just wondering is the neuronlist i created not correct? what should i do?

then i transform the neuron to neuronlist

newskel1 = as.neuronlist(newskel1)
newskel2 = as.neuronlist(newskel2)
newskel = neuronlist(skel1, skel2)
scores = nblast(skel1, newskel)

but it gives me the following:

Error in `[.data.frame`(df, i, j) : undefined columns selected
jefferis commented 4 years ago

Dear Albert, thanks for your interest. Please use read.neurons to read all of your neurons in one go into a neuronlist object. Then use dotprops to convert into the preferred format for NBLAST. See e.g. https://groups.google.com/forum/#!topic/nat-user/_4d1XCtbM90 for an example. Best, Greg.

albertplus007 commented 4 years ago

thanks, i have an another question, the function we use is dotprops(skel, resample = 1, k = 5) the resample = 1, but the resolution of my data is 4x4x40nm, should i also use the resample = 1?

then i use resample = 100 to create the preferred format, cause the resample = 1 will use lots of memory.

Question 2: i random select a neuron for nblast in my own data, then scores = nblast(skel_dps, skel_all_dps)

but the scores are very weird. image except the query neurons (the first line) the others scores are totally negative, what happened to this? then i change resample = 10, the same thing happened again, except the query neuron, the others are all negative.

jefferis commented 4 years ago

The solution to both of these issues is that you need to concert your neurons from units of nl to microns before calculating the “dotprops” objects. You can just divide The neuronlist generated by read.neurons by 1000.

albertplus007 commented 4 years ago

thanks a lot, i have another question, all the flycircuits neuron data were in dps, which is a large neuronlistfh, the i change the neuronlistfh to neuronlist named dps_neuronlist, but when i run the following code to cluster all neurons, it costs a lots of time, and it does not finished yet scores = nblast_allbyall(dps_neuronlist) the neuronlistfh was download buy the paper NBLAST's code

i don't know why it costs such a long time, does it not suitable for large data (there are 16129 neurons in the neuronlist)

jefferis commented 4 years ago

See bottom of this page (linked from the nblast website)

https://gist.github.com/jefferis/bbaf5d53353b3944c090

It is recommended to use remotesync to download all the flycircuit neurons.

albertplus007 commented 4 years ago

Thanks a lot. I used it, and all flycircuit neurons are in my local memory, and i chage the neuronlistfh to neuronlist but a run scores = nblast_allbyall(dps_neuronlist), it still costs a lots of time, and it does not finished yet, i don't know what caused it?

I just want to cluster all flycircuit neurons, it seems to have a big impact on big data

jefferis commented 4 years ago

If you want to do this:

scores = nblast_allbyall(dps_neuronlist)

Although NBLAST is fast by the standards of neuron comparison tools, typically ~2ms per comparison for the flycircuit dataset, this will indeed take a lot of time. There are 16129 neurons in dps_neuronlist. That means that there are 16129^2=260,144,641 comparisons to make. At 2 ms per comparison that would be 16129^2*2/1000/3600/24 = 6 days.

In fact you can make an estimate on your own machine like this by choosing a sample of 1% of the neurons:

set.seed(42)
dpsample=sample(dps, round(length(dps)/100))
system.time(nblast_allbyall(dpsample, .progress='text'))

For me on my laptop this took 70s. The whole dataset will take 100^2 times longer i.e. 69.952*100^2/3600/24=8.09 days.

It is possible to speed this up by running comparisons in parallel. You can do this as follows:

# 4 parallel jobs
doMC::registerDoMC(cores = 4)
system.time(nblast_allbyall(dpsample, .parallel=T))

Note that it is worth testing different numbers of cores as the speedup will not be linear across the full range of cores. It will also depend on how long the whole jobs take as there is a fixed cost associated with starting each parallel job. Be careful also as using more cores will use more memory and eventually may exhaust the memory on your machine and make everything much slower. For me, 4 cores did give a 4x speedup so therefore 2d for the whole dataset. Note that it was a bit quicker in the command line version of R than inside Rstudio.

albertplus007 commented 4 years ago

Thanks a lot, i got it.

albertplus007 commented 4 years ago

I still have a question, when we got the scores of the flycircuit dataset, then we want to cluster them, but the function nhclust can only process 4000 neurons, so, should i just seletct 4000 neurons, for each cluster? such as hcc = nhclust(files[1:4000], scoremat = scores). does it the same as cluster all them? sucha as hcc = nhclust(scoremat = scores)(if there is no limit for maxneurons)?

jefferis commented 4 years ago

You can increase the maximum number (it was left as a sort of safety threshold). See the maxneurons argument of http://natverse.org/nat.nblast/reference/nhclust.html.

albertplus007 commented 4 years ago

Thanks a lot. Sorry to disturb you, I asked a stupid question and solved it after I submitted the issue