Closed Losses closed 4 years ago
I appreciate your kind effort and insights. I am actually currently working on the matrix implementation of the toolbox. I would love to see your implementation with cached result. Please send me the PR :)
Here are the finally optimization result:
> system.time(mapply(mni_to_region_index, -10:10, -10:10, -10:10, template = 'aal'))
user system elapsed
0.97 0.37 1.35
> system.time(mapply(mni_to_region_index, -10:10, -10:10, -10:10, template = 'aal'))
user system elapsed
0.98 0.32 1.31
> system.time(mapply(mni_to_region_index, -10:10, -10:10, -10:10, template = 'aal'))
user system elapsed
1.01 0.39 1.41
> system.time(mapply(mni_to_region_index, -10:10, -10:10, -10:10, template = 'aal'))
user system elapsed
1.08 0.29 1.38
> system.time(mapply(mni_to_region_index, -10:10, -10:10, -10:10, template = 'aal'))
user system elapsed
1.09 0.22 1.32
But some breaking change happened, and maybe we need some discussion about these changes:
The code was reformated with tidyR
, and adjusted to Google's coding style to make sure it could be accepted by CRAN, sending a PR with such a modification seems rude, but for the readability of the code, these change is strongly suggested.
Useless files like .Rproj.user
, .Rhistory
, .RData
, .Ruserdata
are not necessary and may have the risk of exposing developer privacy information. It's strongly recommended to remove these file, not only for the concern of disk space.
The dataset was moved to a single list called label4mri_metadata
, organized by atlas, for the ease of data query.
region_index_to_name
region_index_to_name
was removed to reduce the complexity of the code under the new architecture.
mni_to_region_name
The API of mni_to_region_name
was changed, added a parameter template
for users to choose they want to query region from which template (AAL, BA provided now, marsAtlas and MNI template under consideration on my own implementation).
The format of the returned result was also changed, we may need to discuss the format of it seriously, it's unavoidable. current returned result format is:
> mni_to_region_name(0, 0, 0)
$`aal.index`
[1] 77
$aal.distance
[1] 7.81025
$aal.label
[1] "Thalamus_L"
$ba.index
[1] 40
$ba.distance
[1] 4.242641
$ba.label
[1] "Left-Thalamus (50)"
let me know if we need to adjust the format.
mni_to_region_index
This function was refactored for the robustness concern:
If nothing queried out with distance = F
, the function will return $distance = NULL
but not a string with a warning message. The warning message is not necessary if the developers defined the behavior of the function clearly. If anyone thinks a warning is necessary, we could provide it with the warning method, but in most use cases, it's annoying.
The returned result is provided in a consistent format. Returning results in different formats in different situations can cause a lot of trouble for the user's debugging process, and returning consistent results can help improve the robustness of the library, we can have a discussion about this.
Rewrite the install
section with a copy-and-paste-friendly format may make the install process not so painful :P
You can find my code in https://github.com/Losses/label4MRI/tree/master/label4MRI . It these changes are acceptable, I'll send a PR, if anything need to be discussed, plz reply :)
I earnestly appreciate your advise and effort. In general, I am with your suggestions. I am going to address the issues you raised and would like to know your opinions.
If "tidyR" refers to tidyverse packages (e.g. dplyr, tidyr), I am not a big fan of using tidyR in package development. Although I rely heavily on them for interaction coding, the fact that they are subjected to package updates make it less stable, compared to using the functions embedded in base R.
However, if you are referring to the tidyverse style guide (https://style.tidyverse.org/), I agree with you. Let's adjust to code format to align with it.
Thanks for reminding me of this ;)
Agreed. This makes the data set tidier and allow new template to be added easily.
Encapsulating the projection from index to name by a function makes the code more readable. But I am okay to remove it. It does not make a big difference though.
There is a bug in your version when there is a exact match but with distance = F
.
For example, mni_to_region_name(x = 10,y = 10,z = 10,distance = F,template = "aal")
throws an error:Error in ans[!test & ok] <- rep(no, length.out = length(ans))[!test & : replacement has length zero.
I figured out that the error is resulted from the line 8 of mni_toregion_index():
r_distance <- ifelse(distance, 0, NULL)
. The function ifelse() somehow does not allow NULL to be the returned value. My workaround is to have it return the character "NULL". So the codes become r_distance <- ifelse(distance, 0, "NULL")
, as well as for the line 16 r_distance <- "NULL"
. My preferred codes are shown as below.
if (length(matrix_index) != 0) {
r_index <- label4mri_metadata[[template]]$coordinate_label[matrix_index]
r_distance <- ifelse(distance, 0, "NULL")
} else {
if (distance) {
d2 <- colSums((label4mri_metadata[[template]]$coordinate_list - c(x, y, z))^2)
r_index <- label4mri_metadata[[template]]$coordinate_label[which.min(d2)]
r_distance <- sqrt(min(d2))
} else {
r_index <- "NULL"
r_distance <- "NULL"
}
}
Following the format above, I would also like to change the returned value of result$label from "character(0)" to "NULL" when there is no exact match with distance = F
.
E.g., mni_to_region_name(x = 0,y = 0,z = 0,distance = F)
returned "character(0)" in your version. Therefore, I preferred to convert it to NULL by inserting the ifelse() line.
result$label <- as.character(
label4mri_metadata[[.template]]$label[label4mri_metadata[[.template]]$label$Region_index == result$index, "Region_name"]
)
result$label <- ifelse(length(result$label)==0,"NULL",result$label)
result
To summarize the formatting rules:
# (1) When distance = T and with exact match:
> mni_to_region_name(10, 10, 10)
$aal.distance
[1] 0
$aal.label [1] "Caudate_R"
$ba.distance [1] 0
$ba.label [1] "Right-Caudate (48)"
mni_to_region_name(0, 0, 0) $aal.distance [1] 7.81025
$aal.label [1] "Thalamus_L"
$ba.distance [1] 4.242641
$ba.label [1] "Left-Thalamus (50)"
mni_to_region_name(10, 10, 10, distance = F) $aal.distance [1] "NULL"
$aal.label [1] "Caudate_R"
$ba.distance [1] "NULL"
$ba.label [1] "Right-Caudate (48)"
mni_to_region_name(0, 0, 0, distance = F) $aal.distance [1] "NULL"
$aal.label [1] "NULL"
$ba.distance [1] "NULL"
$ba.label [1] "NULL"
I believe this makes the format of returned values more consistent and readable. Please let me know if you would prefer the other ways.
### 5. Other refinements that I have been working on
(This part is out of the scope of this issue and is only for your interest)
#### 5-1. Improve efficiency by changing the data structure.
The current implementation of both of our versions stores the mapping data (i.e., brain coordinates - brain label ) in a list. This requires the program to search through the whole list just to find a coordinate of interest. Storing the mapping data in a 3d matrix would speed up the searching process by utilizing. matrix indexing. This is what I have been working on.
#### 5-2. Improve efficiency by using a better searching algorithm.
The other step that takes a huge amount of time is finding the nearest brain region when there is no exact match, e.g., mni_to_region_name(0, 0, 0). The current implementation computes the distances of EVERY voxel to the interested coordinate and then output the voxel with minimum distance. This could be speeded up if we choose to use a "spiral-out" approach, i.e., incrementally expanding the sphere of the given coordinate until the sphere reaches the first closest brain region.
My R version for testing the code is 3.4.2
Thank you once again for your endeavor. Please let me know if my replies make sense. I am happy to have your PR once we are on the same page.
Sorry, really busy recent days, until next weekend,
some idea about 5-2, we can get max distance value from brainSpy
's distance db, we only need to search within a box with maxmium distance result as border width, it will be much faster.
It would be great if you could send the PR soon. I'd like to make my progress based on your revision.
:joy: not in the town recently, and no computer in my hand
I have pulled your fork and merge it into mine and make progress on it. I will keep this issue open to allow and encourage further discussion.
For a large number of queries, the speed of
label4MRI
seems to be a little bit slow, I'm very curious about how to optimize this lib for a better speed and try two ways, to use matrix calculation and cache the result with hdfs5. Just got some very interesting result.This is the calculating speed with original
mni_to_region_index
:Optimize with matrix
Here's the code:
The query speed has improved a little:
Optimize with cached result
I tried batch query all the result of x ∈ [-100, 100], y ∈ [-150, 150], z ∈ [-100, 100] in a Python implementation, and cache all result to a hdfs5 database.
The DB size is not very big (231 MB), and memory usage is quite small (1248 bytes)
here're the code and result:
Code:
The third solution is quite useful for a massive query. maybe we could have a discussion about this.
Py implication: https://github.com/ezPsycho/brainSpy.py
Database: https://github.com/ezPsycho/brainSpy-data (MNI2BA included)
CLI tool: https://github.com/ezPsycho/brainSpy-cli
Finally, we generated a MNI2BA list which is similar to
mni2aal
, but to apply it to this project, some architure adjustment may need to be applied, if you are interested in the list, I can send a PR to the repo.