theislab / scgen

Single cell perturbation prediction
https://scgen.readthedocs.io
GNU General Public License v3.0
260 stars 52 forks source link

Issue with scgen.batch_removal function #1

Closed nhuhoa closed 5 years ago

nhuhoa commented 5 years ago

Dear Naghipourfar and M0hammadL,

Thanks for very nice work. It will be very useful for my work. I have tested scGen and there is a small bug at the "batch_removal" function. So I just want to let you know here.

I just test scGen with 2 batches, batch 0 and batch 1 and only one cell type with index 1. Training process goes well, but the batch removal step has a bug: The bug is at line 291 of util.py: all_shared_ann = sc.AnnData.concatenate(*shared_ct, batch_key="concat_batch") // after concatenate function, we suppose to have a pandas dataframe: all_shared_ann.obs["concat_batch"], but the program do not return this dataframe, only return all_shared_ann.obs["batch"], and all_shared_ann.obs["cell_type"].

So at the next line: del all_shared_ann.obs["concat_batch"]
// this dataframe does not exist, program throw error here.

The same idea with the line 301 of util.py: del all_corrected_data.obs["concat_batch"]

When I comment 2 bugs above, the program works well and give me a corrected matrix.
The versions of package I use are: scgen: 1.0.0.dev25+347e176 anndata: 0.6.18 scanpy: 1.4 numpy: 1.14.5

Thanks, Hoa Tran

mumichae commented 5 years ago

I'm having the same issue while working with 2 batches (0 and 1). The problem probably comes from the fact, that there is only 1 object in shared_ct, so that the key concat_batch is not created during concatenation and can therefore not be removed. This then throws the error. I fixed this issue by simply adding a statement checking whether the concat_batch key is contained as a column before deleting it.

M0hammadL commented 5 years ago

Hi, Thanks for mentioning this, now it should be fixed.