poseidonchan / TAPE

Deep learning-based tissue compositions and cell-type-specific gene expression analysis with tissue-adaptive autoencoder (TAPE)
https://sctape.readthedocs.io/
GNU General Public License v3.0
47 stars 9 forks source link

unable to get expected results #4

Closed robinredX closed 2 years ago

robinredX commented 2 years ago

Hi, thanks for the nice tool. I was trying TAPE on SDY67 using Tape. I took following steps:

  1. As used in TAPE, I downloaded simulated PBMC data from Scaden (https://figshare.com/articles/dataset/PBMC_training_data/8052221).
  2. I seperated the h5ad file into pbmc_data_sim.h5ad (which includes four simulated datasets) and a txt file which includes bulk dataset (bulk.txt).
  3. I chose following parameters: datatype = "counts" as suggested in the readme file because the simulated dataset is from 10x, mode = "overall", adaptive = True and batch size=128. I chose 20 epochs based on the formula in readme.

I am using following parameters in Deconvolution().

Deconvolution("pbmc_data_sim.h5ad",  "bulk.txt", sep='\t', datatype='counts', genelenfile=None, mode='overall', adaptive=True,
                           save_model_name="tape_sdy67",
                           batch_size=128, epochs=20)

My results are as follows:

Monocytes
CCC: 0.14
r: 0.43
RMSE: 0.07
-----
CD4Tcells
CCC: nan
r: nan
RMSE: 0.33
-----
Bcells
CCC: nan
r: nan
RMSE: 0.05
-----
NK
CCC: 0.08
r: 0.66
RMSE: 0.07
-----
CD8Tcells
CCC: 0.02
r: 0.35
RMSE: 0.23
-----
Over all celltypes
CCC: 0.37
r: 0.4
RMSE: 0.19

I don't know what I am doing wrong. Is there some step that I am missing?

Thanks. robin.

poseidonchan commented 2 years ago

Hi, Robin:

Thanks for trying TAPE and pointing out the problem. To solve your problem, I quickly tested it. I am not sure about the concrete bug, I propose some possible reasons and I put a reproducible notebook for your test.

I checked the code and noticed that the network structure in https://github.com/poseidonchan/TAPE/blob/main/TAPE/model.py has an extra CELU() function. I will fix it in further version.

On the other hand, I notice that you train it with four simulation datasets and 20 epochs. But in our paper, we only use data8k for training. Because, in my practice, I find that using data8k only is better.

Further issues may be related to the variance cutoff choice.

In this repository, I store the original file when I test the real bulk performance of TAPE (https://github.com/poseidonchan/TAPE/blob/main/Experiments/TAPE_realbulk.ipynb). You can check it. For your information, I also tested the performance in the following file a few minutes ago. (Sorry for the inconvenience, I will also email you the original notebook file.) You can run the code I attached to test it. TAPE_sdy67_response.pdf

In my test, the overall ccc is around 0.7 and the L1 error is around 0.065.

Though the manuscript is still under review, I must apologize for the immature public version and the inconvenience, I will fix it as soon as possible.

Good Evening, Yanshuo

robinredX commented 2 years ago

Thank you for this quick response and sending the notebook. robin.