snehamitra / SCARlink

32 stars 6 forks source link

request for sample data objects #6

Open sid5427 opened 3 months ago

sid5427 commented 3 months ago

Hi Authors,

Could you please provide the PBMC sample scRNA rds object and scATAC archR object used in the preprocessing_scRNA_scATAC.ipynb notebook?

It would be good to have with this so we can run the tool and see if it's installed properly plus helps establish a baseline example before we run with our own data.

Thanks!

snehamitra commented 3 months ago

Thanks for the suggestion! We uploaded example Seurat (scRNA-seq) and ArchR (scATAC-seq) objects that can be accessed using the preprocessing_scRNA_scATAC.ipynb notebook.

The same objects can also be used now to run scarlink_processing inside tutorial.ipynb.

sid5427 commented 3 months ago

Awesome thanks! I also have a couple of questions regarding the usage of the tool -

  1. Is there anyway to limit the search space to peaks only? instead of the adjacent 500bp tiles?
  2. Can we use a GPU to run the tool? I am running the tool on our cluster and the job log file shows tensorflow libraries are being loaded and it suggests using the GPU queue.
snehamitra commented 3 months ago
  1. We wanted to avoid a pre-defined peak-set and rely on the multi-ome to identify the regulatory regions by linking tiles to gene expression. Also, the peak set can be sensitive to clustering. Re-clustering the data at a different resolution might yield a different peak set.
  2. The current version is not optimized to run on GPU. We will make the modfications very soon!
sid5427 commented 3 months ago

Thanks - makes sense! Though I would still suggest adding an option to use peaksets if possible. There are many new cell atlases published and being developed, people use these atlases to annotate their cells and skip the clustering step.

I do see at a later step the tool reports which tiles are correlated to the genes. I assume it returns an output with bedfile like coordinates?

snehamitra commented 3 months ago

That makes sense! We will incorporate the usage of peak set into the model.

Regarding your question of gene-linked tiles, the model reports scores for each gene-tile pair when you run scarlink_tiles.