yezhengSTAT / scVI-3D

GNU General Public License v3.0
5 stars 2 forks source link

Question regarding batch effect removal step #5

Open sdontsay opened 1 year ago

sdontsay commented 1 year ago

Hi Ye,

Thanks for this great tool first. I am trying to run it on the Kim et al. 2020 dataset, but not the one you provided, I download it elsewhere, where the cells are concatenated together, and 16707 cells in total. Well, that shouldn't be a problem, the resolution is still 500k, and I isolated the cells by their ids, and converted the data to the format that scVI-3D can process. In order to account for the batch effects, I also included a cell summary file as follows (sampled from the file),

name batch cell_type cell_1.txt IMR90-HAP1.R1 HAP1 cell_2.txt IMR90-HAP1.R1 HAP1 cell_3.txt IMR90-HAP1.R1 HAP1 cell_4.txt IMR90-HAP1.R1 HAP1 cell_5.txt IMR90-HAP1.R1 HAP1 cell_6.txt IMR90-HAP1.R1 IMR90 cell_7.txt IMR90-HAP1.R1 HAP1 cell_8.txt IMR90-HAP1.R1 HAP1 cell_9.txt IMR90-HAP1.R1 HAP1

as the cell summary is shown above, although I don't have the depth and sparsity information in the example file, I think it should be enough for batch removal.

However, when implementing the algorithm, I got the following error message after 400 epochs, multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/mmfs1/apps/spack/0.16.1/linux-rhel8-zen2/gcc-10.2.0/python-3.8.6-2pmflf74yv3epdgoav5gykxzbrdxl37l/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, kwds)) File "/mmfs1/scratch/sdontsay/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in call return self.func(*args, *kwargs) File "/mmfs1/scratch/sdontsay/lib/python3.8/site-packages/joblib/parallel.py", line 262, in call return [func(args, kwargs) File "/mmfs1/scratch/sdontsay/lib/python3.8/site-packages/joblib/parallel.py", line 262, in return [func(*args, *kwargs) File "/mmfs1/scratch/scVI-3D/scripts/scVI-3D.py", line 194, in normalize imputeTmp = imputeTmp + model.get_normalized_expression(library_size = bandDepth, transform_batch = batchName) File "/mmfs1/scratch/sdontsay/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/mmfs1/scratch/sdontsay/lib/python3.8/site-packages/scvi/model/base/_rnamixin.py", line 100, in get_normalized_expression transform_batch = _get_batch_code_from_category( File "/mmfs1/scratch/sdontsay/lib/python3.8/site-packages/scvi/model/_utils.py", line 243, in _get_batch_code_from_category raise ValueError(f'"{cat}" not a valid batch category.') ValueError: "GM12878-IMR90.R1" not a valid batch category. """

I don't understand how to fix this problem, as I have included the batch information in the cell summary file. And I checked out the source code in scvi tools, it looks like it can only account for known batches, is that correct? I used to use another tool, and it can deal with this batch effect I attached, so I thought that you can just throw the batch information over and get it eliminated. If not, please correct me, thanks!

yezhengSTAT commented 1 year ago

Hello, I didn't come across a similar error before. Maybe you can check if there is anything special about the GM12878-IMR90.R1 batch category? Have you tried the demo data and see if you can make it run successfully? Will the program continue successfully if GM12878-IMR90.R1-related cells are not included?

Personally, I do not recommend removing the batch effect where batch and cell types are confounded. We also provided the batch removal results (UMAP figures) where batch and cell types are confounded in the paper, which tends to mess up the cell type separation.

Best, Ye

sdontsay commented 1 year ago

Thank you, Ye, for your quick response! I run the demo data already, and it worked well. I think I can try to run the Kim et al. dataset without the GM12878-IMR90.R1-related cells, but even if it works, it does not mean too much to me, since I still need those cells to be normalized.

I guess maybe I can run with no batch removal turning on, as you suggested above. Moreover, may I ask do you have a cell summary file for the Kim et al. dataset you provided in BandNorm? If you do, perhaps I can try yours.

Additionally, when I run your script, some of the scvi-3D.py code that comes from the scvi tools package have been deprecated, which is "scvi.data.setup_anndata(adata)", it is now called by "scvi.model.SCVI.setup_anndata(adata)", you may need to update your code accordingly.

Thanks

yezhengSTAT commented 1 year ago

Yes, the summary file for Kim2020 is provided through the BandNorm package: https://sshen82.github.io/BandNorm/articles/BandNorm-tutorial.html#download-existing-single-cell-hi-c-data

More specifically: https://pages.stat.wisc.edu/~sshen82/bandnorm/Summary/Kim2020_Summary.txt

Yes, the scvi-tool has been updated quite frequently after we launched scVI-3D. Thanks for pointing it out! We will make it more robust to newer and older versions.

Thanks, Ye

sdontsay commented 1 year ago

Thanks for the information!

Moreover, may I ask a question regarding BandNorm? I see that in the tutorial of BandNorm that you provided (https://sshen82.github.io/BandNorm/articles/BandNorm-tutorial.html), you can just provide the same contact regions format input files (format 1) to BandNorm to do the normalization. However, I don't see anywhere you mentioned including the cell summary information when implementing the main function of BandNorm, which is "bandnorm_result = bandnorm(hic_df = hic_df, save = FALSE)", while you have that option in scVI-3D. Did I miss something or it is just not necessary?

Thanks

yezhengSTAT commented 1 year ago

Hello, Yes, you are right. Summary information is not needed for BandNorm normalization. BandNorm itself does not remove the batch effect. Therefore, BandNorm only needs the contact counts as input. To remove the batch effect, you can run harmony after the BandNorm normalization as indicated in https://sshen82.github.io/BandNorm/articles/BandNorm-tutorial.html#use-bandnorm.

Best, Ye

sdontsay commented 1 year ago

Thank you, Ye, I think I've got what I want to know about.

Regards