Upgrading from scot v1 to scot v2 : Problem with integrating datasets with different number of samples

rsinghlab / SCOT

Gromov-Wasserstein based optimal transport for aligning single-cell multi-omics data

http://rsinghlab.github.io/SCOT

MIT License

66 stars 16 forks source link

Upgrading from scot v1 to scot v2 : Problem with integrating datasets with different number of samples #8

Closed sheetalgiri closed 1 year ago

sheetalgiri commented 2 years ago

Whilst upgrading I get the following error integrating datasets with different numbers of samples (3293 and 3164)

Traceback (most recent call last):
  File "scot_v2_try.py", line 22, in <module>
    aligned_X, aligned_y= scot_aligner.align(k=k, e=e, normalize=normalize)
  File "SCOTv2/src/scot.py", line 173, in align
    X_aligned, y_aligned = self.barycentric_projection(XontoY=XontoY)
  File "SCOTv2/src/scot.py", line 143, in barycentric_projection
    self.X_aligned=np.matmul(self.coupling, self.y) / weights[:, None]
ValueError: operands could not be broadcast together with shapes (3293,1300) (3164,1)

Here's my code

scot_aligner=SCOT(X, y)
aligned_X, aligned_y= scot_aligner.align(k=k, e=e, normalize=normalize)

Please let me know what I can change. The examples I am able to find seem to all be for the same number of samples

pinardemetci commented 2 years ago

Hi sheetalgiri,

We are actually in the process of registering versions of SCOT, and as a result, the repository is a work in progress. I apologize for the inconvenience this might have caused. Could you share the scotv2 code you are using here?

Thanks! -Pinar.

sheetalgiri commented 2 years ago

Hi Pinar, Thanks for your reply ! I used this version of the code https://github.com/rsinghlab/SCOT/commit/7422a7c7c2e5fc3c4a29eb018f4010714296dd20 and load scot from src/scot.py

MeyerBender commented 1 year ago

Hi,

I get the same error when trying to run SCOT.

The issue seems to arise in the line self.X_aligned=np.matmul(self.coupling, self.y) / weights[:, None], which is part of the barycentric_projection() function. The coupling has the shape (X_n, Y_n), and y has the shape (Y_n, Y_f). The result of the matrix multiplication is hence of shape (X_n, Y_f). The shape of the weights is (Y_n, 1), which is why the division fails.

My feeling is that the weights should be calculated on axis 1, not axis 0, so that their shape is (X_n, 1). Can you confirm this? Or is the issue somewhere else?

Cbaker37 commented 1 year ago

Hi,

Your intuition is correct – we should be dividing, for each sample x in X, by the total mass x transports. Considering we are taking row-wise weighted averages (given the coupling matrix) of the samples in y in order to estimate where each x should lie, we should be dividing by how much mass x transports (which is the sum along axis=1 of the coupling matrix, not axis=0). Thank you for pointing this out – we have just resolved the issue.