lukesonnet / KRLS

R package for machine learning technique to fit flexible, interpretable functional forms for continuous and binary outcomes.
Other
21 stars 12 forks source link

Eigvalue of 0 in some instances #17

Closed lukesonnet closed 7 years ago

lukesonnet commented 7 years ago

There are some cases where R is returning an eigenvalue of 0 in the generateK() function. This causes all sorts of problems downstream. This seems to only happen without truncation, and is thus caused by R's default eigen() function. Should I prevent this by adding the minimum double R can handle to 0 eigenvalues? I very rarely run into it and it's only when lambda is very large and there is no truncation.

chadhazlett commented 7 years ago

option 1: change downstream things so that this isn't a problem

option 2: if you get an eig of 0, go to a truncated K instead cutting off the V's that have eig=0. throw a warning.

option 3: default to truncated with epsilon equal to machine precision.

option 4: add back machine zero to the 0 eigenvalues.

My inclinations would be towards 1 or 2. what do you think?

On Mon, Oct 30, 2017 at 12:50 PM, Luke Sonnet notifications@github.com wrote:

There are some cases where R is returning an eigenvalue of 0 in the generateK() function. This causes all sorts of problems downstream. This seems to only happen without truncation, and is thus caused by R's default eigen() function. Should I prevent this by adding the minimum double R can handle to 0 eigenvalues? I very rarely run into it and it's only when lambda is very large and there is no truncation.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVCyAGjyDxitOJ__sLavRWJjjOM6Zpmks5sxih6gaJpZM4QLwVF .

lukesonnet commented 7 years ago

D^-1 is used in the solution so I don't think option 1 is viable.

Option 2: There are some eigvals that have nonzero (but 1e-19 or so) values after the one that is an actual 0, so that doesn't seem to work.

chadhazlett commented 7 years ago

Hmm. Still not wild about adding something. Perhaps we should have essentially a tolerance of 1e-16 or so and always truncate at that?

On Mon, Oct 30, 2017 at 2:05 PM, Luke Sonnet notifications@github.com wrote:

D^-1 is used in the solution so I don't think option 1 is viable.

Option 2: There are some eigvals that have nonzero (but 1e-19 or so) values after the one that is an actual 0, so that doesn't seem to work.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340584338, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVCyAPmxTnC7QiEl_gmjlOp-EZNb-t0ks5sxjo2gaJpZM4QLwVF .

lukesonnet commented 7 years ago

That's a more sensible solution, I think. Let me ask a guy working on DeclareDesign who might have an idea.

On Mon, Oct 30, 2017 at 2:09 PM Chad Hazlett notifications@github.com wrote:

Hmm. Still not wild about adding something. Perhaps we should have essentially a tolerance of 1e-16 or so and always truncate at that?

On Mon, Oct 30, 2017 at 2:05 PM, Luke Sonnet notifications@github.com wrote:

D^-1 is used in the solution so I don't think option 1 is viable.

Option 2: There are some eigvals that have nonzero (but 1e-19 or so) values after the one that is an actual 0, so that doesn't seem to work.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340584338, or mute the thread < https://github.com/notifications/unsubscribe-auth/AFVCyAPmxTnC7QiEl_gmjlOp-EZNb-t0ks5sxjo2gaJpZM4QLwVF

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340585329, or mute the thread https://github.com/notifications/unsubscribe-auth/AFDZp42xsZq3kSswzZubD8h681z_gzShks5sxjsfgaJpZM4QLwVF .

--

Luke Sonnet PhD Student, UCLA Political Science

chadhazlett commented 7 years ago

Yeah, and that matrix will be better behaved as well.

On Mon, Oct 30, 2017 at 2:13 PM, Luke Sonnet notifications@github.com wrote:

That's a more sensible solution, I think. Let me ask a guy working on DeclareDesign who might have an idea.

On Mon, Oct 30, 2017 at 2:09 PM Chad Hazlett notifications@github.com wrote:

Hmm. Still not wild about adding something. Perhaps we should have essentially a tolerance of 1e-16 or so and always truncate at that?

On Mon, Oct 30, 2017 at 2:05 PM, Luke Sonnet notifications@github.com wrote:

D^-1 is used in the solution so I don't think option 1 is viable.

Option 2: There are some eigvals that have nonzero (but 1e-19 or so) values after the one that is an actual 0, so that doesn't seem to work.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340584338, or mute the thread < https://github.com/notifications/unsubscribe- auth/AFVCyAPmxTnC7QiEl_gmjlOp-EZNb-t0ks5sxjo2gaJpZM4QLwVF

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340585329, or mute the thread https://github.com/notifications/unsubscribe-auth/ AFDZp42xsZq3kSswzZubD8h681z_gzShks5sxjsfgaJpZM4QLwVF .

--

Luke Sonnet PhD Student, UCLA Political Science

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340586244, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVCyNJ6assL7zxJioWw60wN9IrSCuCVks5sxjvygaJpZM4QLwVF .

lukesonnet commented 7 years ago

Okay, so I currently resolve this problem with:

D <- ifelse(eigobj$values < .Machine$double.eps, .Machine$double.eps, eigobj$values)

But this only runs if truncate = F. Note, this is going to take some of those values that are < 2e-16 and increase them to that amount. I suppose that's not really a problem, and this seems to be a pretty rare case.

lukesonnet commented 7 years ago

The problem with using this rule is that we are changing estimates. There are many cases where we get eigvals < 2e-16 that aren't actually 0 that work. So we would have our first departure from past KRLS results. I don't think this is a good solution.

chadhazlett commented 7 years ago

yeah I don't like changing the eigenvalues. let's check if there are zeros and if so, truncate to min(which(eigs==0))-1 dimensions.

if there are really non-zero eigs after the zero eigs, that is kind of a bug since eigenvalues should be strictly non-increasing. But if that is the case, we could also resort them or, equivalently just do V[,eigs!==0] etc.

On Tue, Oct 31, 2017 at 12:49 PM, Luke Sonnet notifications@github.com wrote:

The problem with using this rule is that we are changing estimates. There are many cases where we get eigvals < 2e-16 that aren't actually 0 that work. So we would have our first departure from past KRLS results. I don't think this is a good solution.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340886219, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVCyB9odPmQFRbD0C7Gk_pCBYmOlCGLks5sx3m-gaJpZM4QLwVF .

lukesonnet commented 7 years ago

I think we should just go ahead with the truncation. I think this would have failed with the initial package, unfortunately I don't have a reproducible example. I'll see if I can recreate this bug.

chadhazlett commented 7 years ago

I'm good with that. Should be more reliable anyhow.

On Tue, Oct 31, 2017 at 1:56 PM, Luke Sonnet notifications@github.com wrote:

I think we should just go ahead with the truncation. I think this would have failed with the initial package, unfortunately I don't have a reproducible example. I'll see if I can recreate this bug.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340904279, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVCyMazfrrm1LU32kdSkhhFLMfSfuMrks5sx4mVgaJpZM4QLwVF .

lukesonnet commented 7 years ago

My only concern is that this is a rare case and we are adding an equality check in the generateK function for it. Anyways, shouldn't be too expensive and is now implemented.

chadhazlett commented 7 years ago

thanks. even with 1e6 numbers, checking for zeros takes no time at all, so I think we're okay.

On Tue, Oct 31, 2017 at 2:09 PM, Luke Sonnet notifications@github.com wrote:

My only concern is that this is a rare case and we are adding an equality check in the generateK function for it. Anyways, shouldn't be too expensive and is now implemented.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340907398, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVCyD8PNHzgiKA8P32gY_e6ez3zxRKcks5sx4x0gaJpZM4QLwVF .

chadhazlett commented 7 years ago

or wait, are you doing the truncation to remove the zeros, or a default truncation to anything below a threshold?

On Tue, Oct 31, 2017 at 2:18 PM, Chad Hazlett chadhazlett@gmail.com wrote:

thanks. even with 1e6 numbers, checking for zeros takes no time at all, so I think we're okay.

On Tue, Oct 31, 2017 at 2:09 PM, Luke Sonnet notifications@github.com wrote:

My only concern is that this is a rare case and we are adding an equality check in the generateK function for it. Anyways, shouldn't be too expensive and is now implemented.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340907398, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVCyD8PNHzgiKA8P32gY_e6ez3zxRKcks5sx4x0gaJpZM4QLwVF .

lukesonnet commented 7 years ago

Currently I see where the first 0 is, and then truncate at that position - 1.

On Tue, Oct 31, 2017 at 2:19 PM Chad Hazlett notifications@github.com wrote:

or wait, are you doing the truncation to remove the zeros, or a default truncation to anything below a threshold?

On Tue, Oct 31, 2017 at 2:18 PM, Chad Hazlett chadhazlett@gmail.com wrote:

thanks. even with 1e6 numbers, checking for zeros takes no time at all, so I think we're okay.

On Tue, Oct 31, 2017 at 2:09 PM, Luke Sonnet notifications@github.com wrote:

My only concern is that this is a rare case and we are adding an equality check in the generateK function for it. Anyways, shouldn't be too expensive and is now implemented.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340907398, or mute the thread < https://github.com/notifications/unsubscribe-auth/AFVCyD8PNHzgiKA8P32gY_e6ez3zxRKcks5sx4x0gaJpZM4QLwVF

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340910186, or mute the thread https://github.com/notifications/unsubscribe-auth/AFDZpwxsWqf4xcSJ_zBDYqCc2vga3XJsks5sx477gaJpZM4QLwVF .

--

Luke Sonnet PhD Student, UCLA Political Science

chadhazlett commented 7 years ago

ok great

On Tue, Oct 31, 2017 at 2:20 PM, Luke Sonnet notifications@github.com wrote:

Currently I see where the first 0 is, and then truncate at that position - 1.

On Tue, Oct 31, 2017 at 2:19 PM Chad Hazlett notifications@github.com wrote:

or wait, are you doing the truncation to remove the zeros, or a default truncation to anything below a threshold?

On Tue, Oct 31, 2017 at 2:18 PM, Chad Hazlett chadhazlett@gmail.com wrote:

thanks. even with 1e6 numbers, checking for zeros takes no time at all, so I think we're okay.

On Tue, Oct 31, 2017 at 2:09 PM, Luke Sonnet <notifications@github.com

wrote:

My only concern is that this is a rare case and we are adding an equality check in the generateK function for it. Anyways, shouldn't be too expensive and is now implemented.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340907398 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AFVCyD8PNHzgiKA8P32gY_ e6ez3zxRKcks5sx4x0gaJpZM4QLwVF

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340910186, or mute the thread https://github.com/notifications/unsubscribe-auth/AFDZpwxsWqf4xcSJ_ zBDYqCc2vga3XJsks5sx477gaJpZM4QLwVF .

--

Luke Sonnet PhD Student, UCLA Political Science

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340910406, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVCyIZ5jsxIXMuWM03w_1sKuoeBxTtcks5sx48sgaJpZM4QLwVF .

lukesonnet commented 7 years ago

Ok, in our KRLS what is happening is that sometimes one of the eigenvalues between the positive and the negative eigenvalues slips to 0. In the same data, original KRLS somehow gets away with an eigenvalue of -3.081488e-33.

lukesonnet commented 7 years ago

The difference lies in the fact that we use kern_gauss (a cpp function) to build K while the original KRLS uses gausskernel to build K. I believe there is less precision in the cpp version and that is carrying through to the eigen decomposition.

chadhazlett commented 7 years ago

oh -- we shouldn't have negative eigenvalues either. K is psd.

On Tue, Oct 31, 2017 at 3:40 PM, Luke Sonnet notifications@github.com wrote:

Ok, in our KRLS what is happening is that sometimes one of the eigenvalues between the positive and the negative eigenvalues slips to 0. In the same data, original KRLS somehow gets away with an eigenvalue of -3.081488e-33.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340928618, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVCyM0X6EyALFcIifKNH5TbyAc3e_xtks5sx6HRgaJpZM4QLwVF .

chadhazlett commented 7 years ago

well between this and getting negative values, I still think we should truncate out everything 0 or below.

On Tue, Oct 31, 2017 at 3:42 PM, Luke Sonnet notifications@github.com wrote:

The difference lies in the fact that we use kern_gauss (a cpp function) to build K while the original KRLS uses gausskernel to build K. I believe there is less precision in the cpp version and that is carrying through to the eigen decomposition.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340928945, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVCyGYU4YJr5UmDoL2TW_UXjLEaaYsVks5sx6I9gaJpZM4QLwVF .

lukesonnet commented 7 years ago

Right, I was wondering why that existed at all.

lukesonnet commented 7 years ago

Presumably numerical error on the part of eigen or in the kernel matrix itself.

lukesonnet commented 7 years ago

I have a feeling that if we truncate at 0 or below, we are not going to be backwards compatible in many cases (i.e. many use cases actually will produce neg eigenvalues)

chadhazlett commented 7 years ago

I'm not super worried about preserving differences that are due to numerical error of this kind. And it's weird though to use eigenvectors that are only possible as a result of numerical error. So I think our main approach should be to truncate at 0 by default. But then can we preserve backward compatibility through an option? e.g. the user can choose not to truncate=FALSE and in that case we use the full K rather than the SVD, the way the old KRLS did? Is that a huge pain to change? Could work on this after the paper is together.

On Tue, Oct 31, 2017 at 3:47 PM, Luke Sonnet notifications@github.com wrote:

I have a feeling that if we truncate at 0 or below, we are not going to be backwards compatible in many cases.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340929844, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVCyCOOP6Gz-Rp9DvukJWlZ0LOPoxrrks5sx6N0gaJpZM4QLwVF .

chadhazlett commented 7 years ago

sorry lemme try with fewer typos:

I'm not super worried about preserving differences that are due to numerical error of this kind. And it's weird to use eigenvectors that are only possible as a result of numerical error. So I think our main approach should be to truncate at 0 by default. But then, can we preserve backward compatibility through an option? e.g. the user can choose truncate=FALSE, and in that case we use the full K rather than the SVD, the way the old KRLS did? Is that a huge pain to change? Could work on this after the paper is together.

On Tue, Oct 31, 2017 at 4:14 PM, Chad Hazlett chadhazlett@gmail.com wrote:

I'm not super worried about preserving differences that are due to numerical error of this kind. And it's weird though to use eigenvectors that are only possible as a result of numerical error. So I think our main approach should be to truncate at 0 by default. But then can we preserve backward compatibility through an option? e.g. the user can choose not to truncate=FALSE and in that case we use the full K rather than the SVD, the way the old KRLS did? Is that a huge pain to change? Could work on this after the paper is together.

On Tue, Oct 31, 2017 at 3:47 PM, Luke Sonnet notifications@github.com wrote:

I have a feeling that if we truncate at 0 or below, we are not going to be backwards compatible in many cases.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340929844, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVCyCOOP6Gz-Rp9DvukJWlZ0LOPoxrrks5sx6N0gaJpZM4QLwVF .

lukesonnet commented 7 years ago

I'll add that issue but I'll work on it later.

chadhazlett commented 7 years ago

sounds good.

On Tue, Oct 31, 2017 at 5:23 PM, Luke Sonnet notifications@github.com wrote:

I'll add that issue but I'll work on it later.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lukesonnet/KRLS/issues/17#issuecomment-340944611, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVCyIzGZpNSQ_S45NsVjZEL5psWOxoUks5sx7n8gaJpZM4QLwVF .