sbailey / empca

Principal Component Analysis (PCA) for Missing and/or Noisy Data
Other
77 stars 23 forks source link

RunTimeWarning: line 129 #1

Open petitvic opened 10 years ago

petitvic commented 10 years ago

Hi,

I just started using empca and it works great. However in a few instances, I have the following Warning which seems to make the PCA stop: empca.py:129: RuntimeWarning: invalid value encountered in double_scalars self.eigvec[k, j] = x.dot(cw) / c.dot(cw)

Any advice on how to get rid of this message?

Thanks in advance.

sbailey commented 10 years ago

Hi.

At first I thought this would be due to NaN or Inf values in one of the vectors (x, cw, or c), but I haven't been able to reproduce that in separate tests. Can you check if the vectors have any funny values? e.g.

if N.any(x != x): print "x has NaN values"

Alternately, is it possible that some of your variables have weight=0 for every observation?

Stephen

On Feb 19, 2014, at 5:43 AM, petitvic notifications@github.com wrote:

Hi,

I just started using empca and it works great. However in a few instances, I have the following Warning which seems to make the PCA stop: empca.py:129: RuntimeWarning: invalid value encountered in double_scalars self.eigvec[k, j] = x.dot(cw) / c.dot(cw)

Any advice on how to get rid of this message?

Thanks in advance.

— Reply to this email directly or view it on GitHub.

petitvic commented 10 years ago

Hi Stephen,

thanks for your quick answer and the pb was indeed that some of my observations had all weights set to 0. It works like a charm now!

Victor

On Thu, Feb 20, 2014 at 6:06 AM, sbailey notifications@github.com wrote:

Hi.

At first I thought this would be due to NaN or Inf values in one of the vectors (x, cw, or c), but I haven't been able to reproduce that in separate tests. Can you check if the vectors have any funny values? e.g.

if N.any(x != x): print "x has NaN values"

Alternately, is it possible that some of your variables have weight=0 for every observation?

Stephen

On Feb 19, 2014, at 5:43 AM, petitvic notifications@github.com wrote:

Hi,

I just started using empca and it works great. However in a few instances, I have the following Warning which seems to make the PCA stop: empca.py:129: RuntimeWarning: invalid value encountered in double_scalars self.eigvec[k, j] = x.dot(cw) / c.dot(cw)

Any advice on how to get rid of this message?

Thanks in advance.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHubhttps://github.com/sbailey/empca/issues/1#issuecomment-35589696 .

sbailey commented 10 years ago

Glad it worked out. You are the second user who has been confused by the cryptic effect of inputs with all weights=0. I should probably add a check at the beginning for that case and refuse to start the fitting until the input is cleaned up.

Curiosity: what are you using empca for?

Stephen

On Feb 20, 2014, at 2:16, petitvic notifications@github.com wrote:

Hi Stephen,

thanks for your quick answer and the pb was indeed that some of my observations had all weights set to 0. It works like a charm now!

Victor

On Thu, Feb 20, 2014 at 6:06 AM, sbailey notifications@github.com wrote:

Hi.

At first I thought this would be due to NaN or Inf values in one of the vectors (x, cw, or c), but I haven't been able to reproduce that in separate tests. Can you check if the vectors have any funny values? e.g.

if N.any(x != x): print "x has NaN values"

Alternately, is it possible that some of your variables have weight=0 for every observation?

Stephen

On Feb 19, 2014, at 5:43 AM, petitvic notifications@github.com wrote:

Hi,

I just started using empca and it works great. However in a few instances, I have the following Warning which seems to make the PCA stop: empca.py:129: RuntimeWarning: invalid value encountered in double_scalars self.eigvec[k, j] = x.dot(cw) / c.dot(cw)

Any advice on how to get rid of this message?

Thanks in advance.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHubhttps://github.com/sbailey/empca/issues/1#issuecomment-35589696 .

— Reply to this email directly or view it on GitHub.

petitvic commented 10 years ago

The check would be good for new users I guess. Regarding my use of empca, we have some NGS Cancer data with several normal and tumoral samples for each patient. I am visualizing these data per patient to see whether there is any mix-ups between the normal and tumoral samples.

Victor

On Thu, Feb 20, 2014 at 4:07 PM, sbailey notifications@github.com wrote:

Glad it worked out. You are the second user who has been confused by the cryptic effect of inputs with all weights=0. I should probably add a check at the beginning for that case and refuse to start the fitting until the input is cleaned up.

Curiosity: what are you using empca for?

Stephen

On Feb 20, 2014, at 2:16, petitvic notifications@github.com wrote:

Hi Stephen,

thanks for your quick answer and the pb was indeed that some of my observations had all weights set to 0. It works like a charm now!

Victor

On Thu, Feb 20, 2014 at 6:06 AM, sbailey notifications@github.com wrote:

Hi.

At first I thought this would be due to NaN or Inf values in one of the vectors (x, cw, or c), but I haven't been able to reproduce that in separate tests. Can you check if the vectors have any funny values? e.g.

if N.any(x != x): print "x has NaN values"

Alternately, is it possible that some of your variables have weight=0 for every observation?

Stephen

On Feb 19, 2014, at 5:43 AM, petitvic notifications@github.com wrote:

Hi,

I just started using empca and it works great. However in a few instances, I have the following Warning which seems to make the PCA stop: empca.py:129: RuntimeWarning: invalid value encountered in double_scalars self.eigvec[k, j] = x.dot(cw) / c.dot(cw)

Any advice on how to get rid of this message?

Thanks in advance.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHub< https://github.com/sbailey/empca/issues/1#issuecomment-35589696> .

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHubhttps://github.com/sbailey/empca/issues/1#issuecomment-35630247 .

sbailey commented 10 years ago

Great. Glad to hear that it is getting use beyond astrophysics.

Stephen

On Feb 20, 2014, at 7:17, petitvic notifications@github.com wrote:

The check would be good for new users I guess. Regarding my use of empca, we have some NGS Cancer data with several normal and tumoral samples for each patient. I am visualizing these data per patient to see whether there is any mix-ups between the normal and tumoral samples.

Victor

On Thu, Feb 20, 2014 at 4:07 PM, sbailey notifications@github.com wrote:

Glad it worked out. You are the second user who has been confused by the cryptic effect of inputs with all weights=0. I should probably add a check at the beginning for that case and refuse to start the fitting until the input is cleaned up.

Curiosity: what are you using empca for?

Stephen

On Feb 20, 2014, at 2:16, petitvic notifications@github.com wrote:

Hi Stephen,

thanks for your quick answer and the pb was indeed that some of my observations had all weights set to 0. It works like a charm now!

Victor

On Thu, Feb 20, 2014 at 6:06 AM, sbailey notifications@github.com wrote:

Hi.

At first I thought this would be due to NaN or Inf values in one of the vectors (x, cw, or c), but I haven't been able to reproduce that in separate tests. Can you check if the vectors have any funny values? e.g.

if N.any(x != x): print "x has NaN values"

Alternately, is it possible that some of your variables have weight=0 for every observation?

Stephen

On Feb 19, 2014, at 5:43 AM, petitvic notifications@github.com wrote:

Hi,

I just started using empca and it works great. However in a few instances, I have the following Warning which seems to make the PCA stop: empca.py:129: RuntimeWarning: invalid value encountered in double_scalars self.eigvec[k, j] = x.dot(cw) / c.dot(cw)

Any advice on how to get rid of this message?

Thanks in advance.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHub< https://github.com/sbailey/empca/issues/1#issuecomment-35589696> .

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHubhttps://github.com/sbailey/empca/issues/1#issuecomment-35630247 .

— Reply to this email directly or view it on GitHub.

londumas commented 6 years ago

The solution is something like this:

mask_empty = sp.sum(ivar,axis=0)>0.
init_size = mask_empty.size
flux = flux[:,mask_empty]
ivar = ivar[:,mask_empty]

run empca, then return:

new_eighenvalue = sp.zeros((nbCoeff,init_size) )
for i in range(nbCoeff):
    new_eighenvalue[i,sp.logical_not(mask_empty)] = eighenvalue[i]