gregversteeg / bio_corex

A flexible version of CorEx developed for bio-data challenges that handles missing data, continuous/discrete variables, multi-CPU, overlapping structure, and includes visualizations
Apache License 2.0
137 stars 30 forks source link

bio_corex Exception on demo big5 project #1

Closed MarcSaric closed 7 years ago

MarcSaric commented 7 years ago

Running with default arguments from the README.me yields the following error for me:

Console output:

Data summary: X has 2000 rows and 50 columns
Variable names are: (0, 'blue_q0'),(1, 'red_q1'),(2, 'q2'),(3, 'q3'),(4, 'q4'),(5, 'blue_q5'),(6, 'red_q6'),(7, 'q7'),(8, 'q8'),(9, 'q9'),(10, 'blue_q10'),(11, 'red_q11'),(12, 'q12'),(13, 'q13'),(14, 'q14'),(15, 'blue_q15'),(16, 'red_q16'),(17, 'q17'),(18, 'q18'),(19, 'q19'),(20, 'blue_q20'),(21, 'red_q21'),(22, 'q22'),(23, 'q23'),(24, 'q24'),(25, 'blue_q25'),(26, 'red_q26'),(27, 'q27'),(28, 'q28'),(29, 'q29'),(30, 'blue_q30'),(31, 'red_q31'),(32, 'q32'),(33, 'q33'),(34, 'q34'),(35, 'blue_q35'),(36, 'red_q36'),(37, 'q37'),(38, 'q38'),(39, 'q39'),(40, 'blue_q40'),(41, 'red_q41'),(42, 'q42'),(43, 'q43'),(44, 'q44'),(45, 'blue_q45'),(46, 'red_q46'),(47, 'q47'),(48, 'q48'),(49, 'q49')
Getting CorEx results
Layer  0
corex, rep size: 2 2
Marginal description:  discrete
Warning: Data matrix values should be consecutive integers starting with 0,1,...
Traceback (most recent call last):
  File "/home/marc/git/bio_corex/vis_corex.py", line 630, in <module>
    n_cpu=options.cpu, ram=options.ram).fit(X)]
  File "/home/marc/git/bio_corex/corex.py", line 161, in fit
    self.fit_transform(X)
  File "/home/marc/git/bio_corex/corex.py", line 195, in fit_transform
    self.update_alpha(self.p_y_given_x, self.theta, Xm, self.tcs)
  File "/home/marc/git/bio_corex/corex.py", line 290, in update_alpha
    log_marg_x = self.calculate_marginals_on_samples(theta[i:i+batch_size], Xm[sample, i:i+batch_size])
  File "/home/marc/git/bio_corex/corex.py", line 376, in calculate_marginals_on_samples
    log_marg_x[:, :, i, :] = self.calculate_p_xi_given_y(Xm[:, i], theta[i])
  File "/home/marc/git/bio_corex/corex.py", line 359, in calculate_p_xi_given_y
    z[:, not_missing, :] = self.marginal_p(xi[not_missing], thetai)
  File "/home/marc/git/bio_corex/corex.py", line 492, in marginal_p
    return np.choose(xi.reshape((-1, 1, 1)), logp).transpose((1, 0, 2))
  File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 351, in choose
    return choose(choices, out=out, mode=mode)
ValueError: invalid entry in choice array

This is on a x86-64 Ubuntu 16.04 with the Ubuntu-provided numpy package under Eclipse Neon with Pydev.

marc@xbox:~/git/bio_corex$ aptitude show python-numpy
Paket: python-numpy                             
State: Installed
[...]
Version: 1:1.11.0-1ubuntu1
[...]

The basic demo does work.

gregversteeg commented 7 years ago

Thanks! I forgot that the are missing values in the Big 5 data. I changed the readme to add the option --missing=-1 . This resolves the problem.