Closed pakiessling closed 1 year ago
It is good to know that you can run gpsFISH on your dataset. For your question, what is the command you used and the size (number of cells) of your smallest cell type after subsampling?
This is the command:
result <- fitness(string=index_list,
full_count_table=as.data.frame(t(sc_count)),
cell_cluster_conversion=sc_cluster
,nCV=10
, rate=0.15,
cluster_size_min=50,
relative_prop=relative_prop,
two_step_sampling_type= c('Subsampling_by_cluster''No_simulation')
)
I am unsure how to retrieve the number of cells after fitness()
subsamples.
The smallest number before subsampling is 718.
I just noticed that fitness()
wants genes as columns, exactly the other way around than gpsFISH_optimize()
, my mistake.
Edit:
This results in
Error in fitness(string = index_list, full_count_table = as.data.frame(sc_count), :
'full_count_table' should have the same row name with 'cell_cluster_conversion'
I guess the documentation on full_count_table must be wrong:
full_count_table | A data frame containing the expression level of each gene in each cell with gene name as row name and cell name as column name. -- | -- cell_cluster_conversion | A data frame with each row representing information of one cell. First column contains the cell name. Second column contains the corresponding cell type name. Row name of the data frame should be the cell name.Thank you for pointing this out. You are right that the documentation on full_count_table is wrong. You should have cells as rows and genes as columns. I have updated it. Sorry for the confusion.
@YidaZhang0628 No problem, any idea about the y variable thing? Can I make gpsfish print what it is doing with the subsampling somehow?
If you increase rate
and cluster_size_min
, is the error still there? Regarding the subsampling part, it is simply the original cell type size times the rate and adjusted by the lower and upper bound. In your case, the lower bound is 718*0.15 which is about 108 cells. This should be enough for 10 cross-validations. If increasing rate
and cluster_size_min
doesn't solve the problem, can you share the file to reproduce this error? I can take a look.
@YidaZhang0628 thank you so much, I will try increasing the parameters first
@YidaZhang0628 even after doubling the subsample - same error :(
Here is the code im running: https://github.com/pakiessling/misc/blob/main/gpsfish.R Here is the dataset (600 MB) https://rwth-aachen.sciebo.de/s/wpNMOYlxOqXiKtH
I took a look at the code and found that this is caused by a mismatch between cross-validation names when there are 10 or more cross-validations. I have fixed this issue. If you re-install the main version, you will be able to run your code without a problem.
Perfect. Thank you!
Hi @YidaZhang0628 ,
thanks to all of your help I now have gpsFISH running quite nicely.
For the evaluation of a panel I tried to increase the number of cross-validations from 5 to 10. I then received the error:
y variable has to contain at least one observation per class for estimation process
Thanks!