huidongchen / simba

SIMBA: SIngle-cell eMBedding Along with features
https://simba-bio.readthedocs.io
BSD 3-Clause "New" or "Revised" License
17 stars 1 forks source link

Apply SIMBA for small dataset #13

Closed hikahika12 closed 1 year ago

hikahika12 commented 1 year ago

Hello!

I'm interested in applying SIMBA to a small dataset. Although I'm attempting to analyze scRNA-seq data with 15 cell nodes and 5000 gene nodes ("n_edges": 76947), the learning process doesn't seem to progress (attached image), and I'm not achieving good embedding results. Could this be due to the small data size?

Is there a limit to the data size to which SIMBA can be applied?

Thank you so much! pbg_metrics.pdf

huidongchen commented 1 year ago

It is clear that there's an issue of overfitting here. In a well-performing model, you should observe a decrease in both training loss and validation loss and an increase in MRR until they stabilize.

What you can do is to further increase the weight decay parameter, wd, until the model no longer exhibits overfitting.

e.g.,

dict_config = si.settings.pbg_params.copy()
dict_config['wd'] = new_larger_value
si.tl.pbg_train(pbg_params = dict_config, auto_wd=False, save_wd=True, output='model2')
hikahika12 commented 1 year ago

I appreciate your quick response! I will try.