Closed skojaku closed 5 months ago
The lower bound is obviously very lose. It is below 0.5 for homogeneous networks (\alpha >3), which should not be. So it is nicer if we could tighten the bound. Also it is curious to compare against the numerical results by @rachithaiyappa for confirmation.
Cool! Just a small note for correctness (should not change anything in your since you've assumed $k{min}=1$ p(k) = $\frac{(\alpha-1)}{k{min}}(\frac{k}{k_{min}})^{-\alpha}$
@skojaku This is what I have so far. The orange curve is what we discussed in the morning (sampling $s^- \sim (p(k))^2$ and $s^+ \sim (kp(k))^2$)
The blue curve is by keeping $k_i$ and $kj$ separate, i.e $s{ij}^- \sim p(k_i)p(kj)$ and $s{ij}^+ \sim k_i p(k_i) \cdot k_jp(k_j)$
Just performed one run of the metropolis-hastings algorithm. Unable to get it to produce meaningful sample for larger values of $alpha > 2.5 $ I think its probably because my initial conditions arent ideal. Need to explore a little more on that front but this looks promising?
Wow. It's very different. Seems that it aligns with my results. Let's see it with more samples.
Wow this is so awesome sada!! π€©π π I will also explore the way to tighten the bound :)
yeah I think the independent condition that both degrees are larger is way tighter than the product condition. I'm wondering whether we can do a change of variable $k \rightarrow \log k$ to make the product as a sum and then use the generating function to obtain the solution. I may try..
Does this line of thought make sense?
Sounds promising π. I could follow up to step 5. But how did you get the sum in step 6? I think it's derived from the probability generating function of $s{ij} - s{xy}$ but wondering how you represent the inequality $$s{ij} - s{xy} \geq 0$$.
Oh it's just that when we have $s_{ij} = 1$ ($a0$), then the only possibility is $s{xy} = 1$ ($b_0$) and the probability is $a_0 b0$. Then for the next term of $s{ij}$ ($a_1$), we can only have $b_0$ and $b_1$, and so on. So $S$ simply counts every possible term. I don't know how to calculate it with the generating functions. π
I see. I think it's possible if the generating functions $G^2_1$ and $G^2_0$ have a simple analytical form. We cannot assume the power law as the generating function does not exist. But I think log normal is promissing?
Oh, the log-normal distribution is continuous. So it may not be applicable.... But assuming that the degree distribution is log-normal, $\log k_i$ is now a gaussian. We can probably compute the probability generating function for $S$.
I derived the exact, close-form expression of AUC-ROC for the log-normal degree distribution:
$$ \begin{align} 1 - \int_{-\infty} ^{\infty} \text{Norm}(z \vert 0, 1) \Phi\left( z -\sqrt{2}\sigma\right) \rm{d}z \end{align} $$
where $\text{Norm}$ is the probability density function of the (standard) normal distribution, and $\Phi$ is its cumulative function.
It is an increasing function of the variance parameter $\sigma^2$ of the log-normal distribution, meaning AUC-ROC increases as the node degree is more heterogeneously distributed.
(see my note. It is also in the main text as an appendix) auc-roc-log-normal.pdf
I have some todo tasks to complete the analysis. Would anyone take over the analysis?
If you could have time and check my derivation, it's greatly appreciated!
I made an assumption that the degree distribution---a discrete distribution---follows the log-normal---a continuous distribution. The assumption is conventional though I'm not sure how good this approximation is.
It would be good to show this analytical result is valid using the simulation results, where the AUC-ROC is computed numerically by generating log-normally distributed random variables.
My results show that AUC-ROC approaches 1 as $\sigma$ increases. But it would be good to show it with the empirical $\sigma$ values using the data we have. I have the code to fit the log-normal distribution
from scipy import stats
sigma, loc, mu = stats.lognorm.fit(deg, floc=0)
Expected: Based on the analytical results Observed: sigma values estimated by the moment method.
Awesome!
Inspired by the derivation by @MunjungKim, I did my exploration. Any hole?
Setting:
Question:
Let us denote $(i,j)$ the positive edge drawn from the excess degree distribution $p1(k)$, and $(x,y)$ the negative edge drawn from the degree distribution $p(k)$. Does $P(s{ij} \geq s _{xy})\rightarrow 1$ as $\alpha\rightarrow 2$ or $1$?
Some potentially useful properties
Power-law degree distribution
$$p(k) = \frac{\alpha-1}{k{\text{min}}} k^{-\alpha}$$ let's assume that $k{\text{min}}=1$ to simplify.
Cumulative distribution
$$\Phi(k) = 1-\int_k ^\infty p(k)\;{\rm d}k = 1-(\alpha-1)[] = 1-x^{-\alpha +1} $$
Average degree
$$\langle k \rangle = \int k p(k)\;{\rm d}k = (\alpha-1)\int k^{-\alpha+1}\;{\rm d}k = \frac{\alpha-1}{\alpha-2}$$
Excess degree distribution
$$ p_1(k) = \dfrac{1}{\int k'p(k')\;{\rm d}k'}kp(k) = \frac{\alpha-2}{\alpha-1} kp(k) = (\alpha-2)k^{-\alpha+1} $$
Derivation
Let us consider a subset of $(i,j)$ and $(x,y)$ that yield $s{ij} \geq s{xy}$. We focus on the case where $k_i \geq k_x$ AND $k_j \geq k_y$, which gives the lower bound, i.e.,
$$ P(k_x \leq k_i) \cdot P(k_y \leq kj) \leq P(s{xy} \leq s_{ij}) $$
Let us focus on $P(k_x \leq k_i)$, i.e.,
$$ \begin{align} P(k_x\leq k_i) &= \int p_1(k_i) \left(\int_0 ^{k_i} p(k_x) {\rm d}k_x\right){\rm d}k_i \ &= \int p_1(k_i) \Phi(k_i){\rm d}k_i \ &= (\alpha-2)\int k^{-\alpha + 1} (1-k^{-\alpha+1}){\rm d}k_i \ &= (\alpha-2) \left[ \int k^{-\alpha + 1}{\rm d}k_i -\int k^{-\alpha + 1}k^{-\alpha+1}{\rm d}k_i \right]\ &= (\alpha-2) \left[ \int k^{-\alpha + 1}{\rm d}k_i -\int k^{-2\alpha + 2}{\rm d}k_i \right]\ &= (\alpha-2) \left[ \frac{1}{\alpha-2} -\frac{1}{2\alpha-3} \right]\ &= 1 -\frac{\alpha-2}{2\alpha-3}\ \end{align} $$
Notice that $P(k_x\leq k_i) = 1$ when $\alpha = 2$. What about $P(k_y \leq k_j)$? Because $k_y$ and $k_j$ are independent of $k_x$ and $k_i$, we have
$$ P(k_x \leq k_i) = P(k_y \leq k_j). $$
Thus, we obtain:
$$ \left( 1 -\frac{\alpha-2}{2\alpha-3} \right)^2 \leq P(s{xy} \leq s{ij}). $$
Therefore, $P(s{xy} \leq s{ij})\rightarrow 1$ when $\alpha \rightarrow 2$.