noamteyssier / adpbulk

pseudobulking on an AnnData object
MIT License
22 stars 1 forks source link

Problem with using raw data #1

Closed alperoglu closed 2 years ago

alperoglu commented 2 years ago

Hi Noam,

This is a great package and thank you for making it available here. It really helped a novice in scanpy and anndata like myself quite a lot.

I just wanted to point out that when I set the use_raw option to True, I got the following error

ValueError: Shape of passed values is (325, 25809), indices imply (325, 1153)

Which I think is because while forming the DataFrame for the pseudo bulk matrix, the transform function is setting the columns to the variable genes from the processed X assay of the AnnData object which there are only 1153 for my data. I came up with the following work around messy transform function that sets the columns to genes from the raw data if use_raw is set as True.

def customTransform(self) -> pd.DataFrame:
        """
        performs the aggregation based on the fit indices
        """
        if not self._isfit:
            raise AttributeError("Please fit the object first")

        matrix = []
        for pairs in tqdm(self.groupings, desc="Aggregating Samples"):
            if not isinstance(pairs, tuple):
                pairs = tuple([pairs])
            if pairs in self.grouping_masks:
                matrix.append(self._get_agg(self.grouping_masks[pairs]))

        # stack all observations into single matrix
        matrix = np.vstack(matrix)

        if self.use_raw:
             self.matrix = pd.DataFrame(
                matrix,
                index=self.meta.SampleName.values,
                columns=self.adat.raw.var.index.values)
        else:
            self.matrix = pd.DataFrame(
                matrix,
                index=self.meta.SampleName.values,
                columns=self.adat.var.index.values)

        self._istransform = True
        return self.matrix

It's a simple fix but I thought it might be nice to have a reference for it here for future users.

Best, Alper

noamteyssier commented 2 years ago

thanks for pointing this out! I updated the code with a similar fix.