Open azhx opened 7 months ago
Hi,
You can define the $\alpha$-sigmoid(x) (using parameters $p_x$) with respect to the $\alpha$-entmax($\bf{y}$) (using parameters $p_y$), by setting $\bf{y} = [x, 0]$ and $\bf{p_y} = [p_x, 1 - p_x]$. Let me know if that does not answer your question!
I understand that this is how you're defining $\alpha$-sigmoid, it's just that the docstring for the entmax_bisect function says the optimization being solved is max_p <x, p> - H_a(p)
. In the original paper by Peters, et. al, they also seem to say that the bisection algorithm is to solve the maximization problem with the addition rather than the subtraction. Maybe the docstring has a typo? Or have I missed something mathematically. I haven't gone deep into the bisection algorithm itself.
Anyways, I understand how your methods works, so all good. thanks!
Hi, On your paper, you state that alpha-sigmoid is defined as
However, the entmax_bisect function you use solves the optimization for
max_p <x, p> - H_a(p)
Can you clarify this discrepancy?