theislab / scgen

Single cell perturbation prediction
https://scgen.readthedocs.io
GNU General Public License v3.0
260 stars 52 forks source link

Metadata shuffled when applying batch correction #24

Closed fhausmann closed 4 years ago

fhausmann commented 4 years ago

Description: When trying to run batch_removal metadata (anndata.obs) get shuffled and are not corresponding to the obs_names. It is probably related to/ a consequence of: https://github.com/theislab/scgen/issues/7

Version: scgen: '1.1.4'

Code to reproduce:

import scgen
import scanpy as sc
import pandas as pd
import anndata

class DummyNet: # To not require full training for reproducing the issue

    def to_latent(self, data, *unused_args, **unused_kwargs):
        return data

    def reconstruct(self, data, *unused_args, **unused_kwargs):
        return data

metadata = pd.DataFrame({
    'cell_type': [
        'celltyp1', 'celltyp1', 'celltyp1', 'celltyp2', "celltyp2", "celltyp2",
        "celltype3"
    ],
    'batch': [
        'batch1', 'batch1', 'batch2', 'batch1', 'batch2', 'batch2', 'batch2'
    ]
})

metadata.index = metadata.cell_type + "_" + metadata.batch + '_' + metadata.index.astype(str)

metadata=metadata.sample(frac=1) # To shuffle the dataframe

test_data  = anndata.AnnData(np.zeros((metadata.index.size,100)),obs=metadata)

scgen_results = scgen.batch_removal(DummyNet(),test_data) 

scgen_metadata = scgen_results.obs.copy()

comparison = pd.merge(metadata,scgen_metadata,left_index=True,right_index=True, suffixes=('_original','_scgen'))

print(comparison)

Thanks in advance for your help.

fhausmann commented 4 years ago

I guess a possible fix could be to replace: https://github.com/theislab/scgen/blob/0ef7030ef593515bfe75be14e07c9efd444ad857/scgen/models/util.py#L313-L316 with:

        corrected = all_corrected_data
        corrected.X = network.reconstruct(all_corrected_data.X, use_data=True)
        corrected.var_names = adata.var_names.tolist()
        corrected = corrected[adata.obs_names]

This should preserve the input obs and their ordering.

M0hammadL commented 4 years ago

thanks @fhausmann it is now fixed https://github.com/theislab/scgen/commit/7758656cebecd102ed96cb60a1f3e5c51c579d5e