In the paper and code is -N * log1p_exp(log_sum_exp(y)) + sum(y) there's a missing 0.5 * log(N) somewhere. It's a constant so doesn't affect sampling but we should have it in. When I AD through it comes up with that constant.
PR #46 has a more efficient version (I think, it's more efficient).
In the paper and code is
-N * log1p_exp(log_sum_exp(y)) + sum(y)
there's a missing0.5 * log(N)
somewhere. It's a constant so doesn't affect sampling but we should have it in. When I AD through it comes up with that constant.PR #46 has a more efficient version (I think, it's more efficient).