perslab / CELLEX

CELLEX (CELL-type EXpression-specificity)
GNU General Public License v3.0
38 stars 9 forks source link

[BUG]: Operands could not be broadcast together with shapes (15831,12) (15831,15830,13) #34

Closed LIZW2019 closed 1 year ago

LIZW2019 commented 1 year ago

Dear friend,

I came across the Value Error when I used cellex on my data like this:

image

And I fixed this bug by modifying the line 44 in cellex/metrics/det.py like this: Change n_cells = np.array([n_cells.values] * mean.shape[0]) # faster than count

To n_cells = np.array(n_cells.values * mean.shape[0]) # faster than count

My lab member use the same tools but he has never come across this problem. I am wondering if this bug is cause by the different version of numpy, because it seems to be an array calculation problem. My numpy version is 1.21.6, and my cellex version is 1.2.2.

I would be thankful If this problem can be fixed for broader environment toleration. And I would appreciate it if you can tell me the real reason for the different calculation rule in my computer.

Thanks a lot!

Amy

LIZW2019 commented 1 year ago

The bug seems to disappear when I renew the environment...

tstannius commented 1 year ago

Dear Amy,

My apologies for the late reply - I unfortunately did not receive a notification - and thanks for submitting this issue and your debugging efforts!

This issue appears to be very similar to the one reported here. I would suggest you try the same steps as I have suggested Andrew here. I.e. 1) Get a full stack-trace. 2) Create and test a CELLEX environment where the library versions are exactly the same as those listed in requirements.txt

Supposedly the reason for the different computation you observe is that the n_cells does not have the expected number of dimensions. This may be because numpy behaves differently with respect to how it interprets the line you have posted:

n_cells = np.array([n_cells.values] * mean.shape[0])

VS

n_cells = np.array(n_cells.values * mean.shape[0])

It's been some time since I wrote this - if I recall correctly, this line is supposed to result in a 2D array, but if n_cells.values is already 2D, it will become 3D, causing the error you observe further down.

With respect to fixing the problem, we are looking into a new, faster version of CELLEX, but likely not updating the existing version.

LIZW2019 commented 1 year ago

Thanks for your reply!