single-cell-genetics / XClone

Detection of allele-specific subclonal copy number alterations from single-cell transcriptomic data.
https://xclone-cnv.readthedocs.io/en/latest/
Apache License 2.0
29 stars 3 forks source link

RDR: libratio fitting bias #2

Closed Rongtingting closed 1 month ago

Rongtingting commented 1 year ago

Bad libraty ratio fitting raise Nan emm_prob_log

GX109_5400 dataset
xconfig = xclone.XCloneConfig(dataset_name = dataset_name, module = "RDR")
xconfig.set_figure_params(xclone= True, fontsize = 18)
xconfig.outdir = out_dir
xconfig.cell_anno_key = "cell_type"
xconfig.ref_celltype = "unclassified"
xconfig.smart_transform = False
xconfig.top_n_marker = 15
xconfig.marker_group_anno_key = "cell_type"
xconfig.xclone_plot= True
xconfig.plot_cell_anno_key = "cell_type"

xconfig.exclude_XY = True
# xconfig.remove_guide_XY = True
# xconfig.guide_qt_lst = [1e-04, 0.96, 0.9999]

xconfig.display()

RDR_Xdata = xclone.model.run_RDR(RDR_adata,
            config_file = xconfig)

raise an error in CNV_optimazation step because of emm_prob_log with np.nan value, which is related to params in NB prob calculation. In cell CTACGTCTCGGAAACG-1, library_ratio is extremely low (library_alpha is inf), which causes the badly expected layer generated in extra_preprocess

image
Rongtingting commented 1 year ago

Solve the issue by checking the used depth_key (library_ratio_capped by default) before generating the expected layer.

commit

Rongtingting commented 1 year ago

TODO:

  1. use more strict library ratio capping strategy, need to be tested on all datasets.
  2. improve llibrary ratio GLM fitting by adding init params, e.g., counts ratio and specific fixed dispersion. add start params _RDR_libratio.py fit_lib_ratio
Rongtingting commented 1 year ago

set total counts ratio as the default library size ratio is recommended.