Closed pakiessling closed 1 year ago
Yes, you can use it to evaluate the performance of a gene panel. There are multiple ways to do it. The simplest way will be to use the fitness
function. It is the function we used to evaluate the performance of a given gene panel. Another way is to run gene panel selection by following the tutorial. However, instead of randomly initializing an initial population, you initialize a population with every gene panel equal to the gene panel you want to evaluate (e.g., a population with two same gene panels). Then you run gpsFISH_optimize
for one iteration. One minus the outputted fitness value will be the accuracy of your gene panel. Hope this helps and let me know if you have further questions.
Perfect, looking forward to trying it out
Thank you for all of your help @YidaZhang0628
Looking at fitness
it is not entirely clear to me what string: A numeric vector containing the gene panel.
means or how I can get it from my list of genes. Is this just an index of my genes?
Also, do you find 5 a good starting point for cross validation, like in your publication?
Sorry for not making it clear. You are right, string
is just the index of your genes. Specifically, it is the location of your genes in rownames(full_count_table)
. I will update the document in the next round of update. Thank you for pointing this out.
5 should be a good starting point. If you are just evaluating one given panel, you can use more cross validations because you are not doing multiple rounds of optimization.
@YidaZhang0628 Perfect, thanks a lot.
Hi @YidaZhang0628,
sorry to bother you again. This time my question is about the relative_prop
parameter.
Am I right in assuming that Seurat's AverageExpression(dataset, group.by="cell_type")
and AverageExpression(dataset)
on a normalized and scaled dataset would return the right values? I am unfortunately quite inexperienced in the R single cell workflow.
I am not familiar with the functions you mentioned but if it is based on normalized and scaled datasets, it is probably different from what gpsFISH
needs. The gene panel selection tutorial has a section about how to calculate relative_prop
from sc_count
. You can follow that to calculate relative_prop
.
Sorry @YidaZhang0628 , but I once more need your help 😅
I am now trying out fitness on the tutorial dataset. When I run it in "Simulation" mode everything works perfectly, but "No_Simulation" causes an error: Code:
fitness(
string=index_list,
gene_list=marker_panel,
cell_list=cell_list,
cell_cluster_conversion=sc_cluster,
nCV=5,
relative_prop = relative_prop,
two_step_sampling_type = c("Subsampling_by_cluster", "No_simulation"),
cluster_size_min = 20,
# simulation_parameter=simulation_params,
# sample_new_levels = "old_levels",
)
Error:
Error in base::colSums(spatial_sc_count): 'x' must be an array of at least two dimensions
Traceback:
1. fitness(string = index_list, gene_list = marker_panel, cell_list = cell_list,
. cell_cluster_conversion = sc_cluster, nCV = 5, relative_prop = relative_prop,
. two_step_sampling_type = c("Subsampling_by_cluster", "No_simulation"),
. cluster_size_min = 20, )
2. lapply(cvround, classifier_per_cv, cvlabel = cvlabel, gene_list = candidate_gene_panel_loc,
. cell_list = subsample_cell_loc, class_label_per_cell = class_label_per_cell,
. metric = metric, method = method, RF_num_threads = RF_num_threads,
. relative_prop = relative_prop, sample_new_levels = sample_new_levels,
. use_average_cluster_profiles = use_average_cluster_profiles,
. simulation_type = two_step_sampling_type[2], simulation_parameter = simulation_parameter,
. simulation_model = simulation_model, cell_cluster_conversion = cell_cluster_conversion,
. weight_penalty = weight_penalty)
3. FUN(X[[i]], ...)
4. base::colSums(spatial_sc_count)
5. stop("'x' must be an array of at least two dimensions")
Can you send me the data that I can use to reproduce this error?
@YidaZhang0628
I get this error with data(sc_count)
as well as with my own data.
You can find the code I ran here (gps-fish from github, dev version):
https://github.com/pakiessling/misc/blob/main/gpsfish_tutorial.ipynb
From the code, it seems that you are using the development version. Unfortunately, to increase the efficiency of code, we don't have "no simulation" option for fitness
in the development version. If you want to evaluate fitness without simulation, you can try the main version.
@YidaZhang0628 Ok, good to know.
Does that mean gpsFISH will not support panel selection without simulation in the future? E.g. I can't use it if I don't already have a spatial reference for the simulation?
We will implement a no-simulation option for gpsFISH in the future.
Hi, thanks for the tool - it looks really nice.
Can I use gpsfish to evaluate a panel of genes that was not selected by it?