Open micholeodon opened 3 years ago
the use of the computed diffusion times was mostly for evaluating if there was any consistency as the eigenindex was increased, but also to see if there were indicators of a good dimensional cutoff based on the times. both of these were exploratory more than theoretical. this was really trying to get at the question whether the regularization had any intrinsic effect on stability and computation of diffusion time.
Thank you very much for your answer.
But at the same time, this feels somehow contrary to what you mentioned in 2. "while in a diffusion map embedding the same diffusion time is applied to every eigenvalue".
I feel a little bit confused and I would need more help and literature to grasp this. Does any recommended references come to your mind? What about the maybe seeming contradiction I mentioned earlier?
@micholeodon - unfortunately there is no relevant literature that i know of (and definitely we have not pursued any theoretical backing to this). highly empirical as i noted. the closest link is to matrix inverse in image restoration settings.
think of it this way. in a regular diffusion map generation, a diffusion time is an input parameter. and it's a single parameter that scales all the eigenvalues.
in this exploratory setting (diffusion time == 0), we used damped eigenvalues. this comes up in matrix inversion problems. this changes each lambda as a function of itself, which would mean that the effective scaling "diffusion time" for each eigenvalue is different. i.e. we are solving for (t) such that lambda^t === lambda / (1 - lambda)
lambda^t === lambda / (1 - lambda)
t log(lambda) === log (lambda) - log(1 - lambda)
t === 1 - log(1 - lambda) / log (lambda)
on looking at (https://github.com/satra/mapalign/blob/master/mapalign/embed.py#L144) , it seems to generate an exponential of this function (1 - log(1 - lambda) / log (lambda)
), which seems wrong to me. it should just be returning the values without the exponential. i'll submit a change for that. can't remember where that exponential came from. probably just an error.
Thank you. That clarifies much. I understand the doubts about line 144 you mentioned. Changing this line will not change the diffusion maps, fortunately, only the output diffusion times. Please submit a change and I think will close this issue then. Thank you very much.
Embeddings.zip
I would like to ask two mutually related questions about the code:
Where can I found the theory and rationale behind the automated computation of diffusion times (line 144)? I cannot figure out what is the relation between computed output (single embedding) and vector of this diffusion time (more than one value).
Suppose I have run diffusion embedding without predefined diffusion time. In the output, I got the automatically computed diffusion times. Now I take each individual diffusion time and run the algorithm again, but specifying diffusion times explicitly. I wonder why I got different results between the initial run with automated diffusion time estimation and subsequent runs with this parameter specified. In the attachment, I provide the data and code to reproduce the results from question 2.
Thank you very much in advance if you could help me.