morris-lab / CellOracle

This is the alpha version of the CellOracle package
Other
314 stars 56 forks source link

Question about edge filtering strategy of CellOracle #176

Open R-Krait opened 10 months ago

R-Krait commented 10 months ago

Hello. Thank you for developing a great tool!

As I understand, CellOracle utilizes highly variable genes (HVGs) for gene regulatory network (GRN) construction, followed by the inference of the network using p value cut-off and coef_abs cut-off. According to the tutorial, 3000 HVGs with a p-value threshold of 0.001 and the top 2000 edges based on coef_abs are used for the analysis.

However, I encountered a situation where the desired genes fall within the range of 5000 to 7000 HVGs, or even when using 3000 HVGs, the top 2000 edges do not include the genes of interest. In such cases, I found it is necessary to use a larger number of edges than 2000 while keeping the p-value as per the tutorial. To achieve this and minimize false positives, I deviced the following approach with some figures. (with assumption : true edges have most of the total edge weight sum in the network after p-value thresholding)

  1. GRN construction, and then filtering with p-value threshold as 0.001.
  2. Sort coef_abs of filtered network as decreasing order. image
  3. Calculate sum(coef_abs of nodes within top n edges)/sum(coef_abs) along the range of n, [1, len(coef_abs)]. image
  4. Find the elbow point using Kneed library. image
  5. Finally, use this elbow point as cut-off for additional filtering instead of 2000.

However, since my approach has not been validated, I would like to ask if there is any problem for using this approach.

Thank you!

Respectfully.