theislab / scgen

Single cell perturbation prediction
https://scgen.readthedocs.io
GNU General Public License v3.0
260 stars 52 forks source link

`scgen.batch_removal` doesn't save `adata.raw` #16

Closed cartal closed 4 years ago

cartal commented 4 years ago

Hi @M0hammadL and @Naghipourfar ,

I have been working with scGen for a while now and it's giving great results.

There is one issue though, that is causing me trouble. Once you apply the scgen.batch_removal function, the newly reconstructed adata doesn't have the raw. After inspecting this in the code, I realise that this is not included.

Do you think you could fix it?

The results from the main HVGs we use for the batch removal give amazing results, but sometimes you want to explore the expression of other genes that are not included in the HVG set. And just adding the adata1.raw to adata2.raw results in a jumbled expression matrix.

ktpolanski commented 4 years ago

Like Carlos mentioned, he was trying to add a .raw attribute to the resulting object, and the expression plots did not have the genes show up where he was expecting them to. Looking at batch_removal(), I understand why. You subset and concatenate the object, reshuffling the order of the cells within it, and then at the end you paste the original .obs_names over the current ordering, essentially scrambling the index.

This can be amended quite easily - just add index_unique=None to each of your .concat() calls. At the end, you can re-sort the final corrected object to match the original's cell ordering rather than overwrite the .obs_names, and if you do that you can just copy the .raw over if you feel like it.

M0hammadL commented 4 years ago

Like Carlos mentioned, he was trying to add a .raw attribute to the resulting object, and the expression plots did not have the genes show up where he was expecting them to. Looking at batch_removal(), I understand why. You subset and concatenate the object, reshuffling the order of the cells within it, and then at the end you paste the original .obs_names over the current ordering, essentially scrambling the index.

This can be amended quite easily - just add index_unique=None to each of your .concat() calls. At the end, you can re-sort the final corrected object to match the original's cell ordering rather than overwrite the .obs_names, and if you do that you can just copy the .raw over if you feel like it.

Hi @ktpolanski ,

Yes, this can be easily added, At the moment I am busy with something else, I would be happy if you guys would be interested to fix it and I will happily merge your pull requests.

M0hammadL commented 4 years ago

Thanks to @ktpolanski now correc_data also has .raw file of the original adata. @cartal I also updated the example [here(https://nbviewer.jupyter.org/github/M0hammadL/scGen_notebooks/blob/master/notebooks/scgen_batch_removal.ipynb) to show that you can use .raw file after correction.