jianhao2016 / SimiC

this is the github repo for simicLASSO
5 stars 1 forks source link

Preparing DGE matrix for input to SimiC #8

Open bpyenson opened 5 months ago

bpyenson commented 5 months ago

Hi,

I am having a hard time preparing my Seurat object for input to SimiC's pipeline. Is there any code you can share about how you created the .pickle file for the DGE matrix that has cell barcodes as rows and the genes (TF and targets) as the columns?

Thanks,

GuiSeSanz commented 5 months ago

Hello! The matrix needs to have the imputed normalized counts, preferably using Magic. The number of rows needs to be the sum of the targets you want to use (example: 1000) plus the number of TF to use as head of the regulons (example 100). So we have a matrix sized as 1100xNcells.

As we use reticulate, we need to make sure to use the same version to create the pickles, otherwise may raise compatibility errors, so I share two options:

  1. Create the pickles directly from R:
    
    library(reticulate)
    reticulate::use_python("/usr/bin/python3")
    py_discover_config("magic")
    library(Rmagic)

data_MAGIC_df <- data_MAGIC_df[,c(top_MAD_tfs,top_MAD_targets)] #this is the matrix of 1100xNcells reticulate::py_save_object(as.data.frame(data_MAGIC_df), filename = paste0('./Data/SimiC/Organoids_RUN1',MAX_NUM_TARGETS, "_DF.pickle")) reticulate::py_save_object(TFs, filename = paste0('./Data/SimiC/Organoids_RUN1',MAX_NUM_TARGETS, "_TF.pickle")) #this is the list of the TFs

2. Save the matrices as csv, and create the pickle on python:

R

library(reticulate) reticulate::use_python("/usr/bin/python3") py_discover_config("magic") library(Rmagic)

data_MAGIC_df <- data_MAGIC_df[,c(top_MAD_tfs,top_MAD_targets)] #this is the matrix of 1100xNcells write.table(data_MAGIC_df, file= paste0('./Data/SimiC/Organoids_RUN1',MAX_NUM_TARGETS, "_DF.csv"), sep='\t', row.names= TRUE, col.names=TRUE, quote=FALSE) write.table(TFs, file= paste0('./Data/SimiC/Organoids_RUN1',MAX_NUM_TARGETS, "_TF.csv"), sep='\t', row.names= FALSE, col.names=FALSE, quote=FALSE)

Python

import pandas as pd import pickle import os

print('Creating pickles!') analysis_dir= '/root/SimiC/Organoids'

sample = 'Organoids_RUN11000' DF_p = os.path.join(analysis_dir, sample+'_DF.csv') TF_p = os.path.join(analysis_dir, sample+'_TF.csv')

if os.path.exists(DF_p): DF=pd.read_csv(DF_p,header=0, delimiter="\t") DF_pickle = os.path.join(analysis_dir, sample+ '.DF.pickle')

DF.to_pickle(DF_pickle)

with open(DF_pickle, 'wb') as mypickle:
    pickle.dump(DF,mypickle)

else: print("The file" + DF_p +" does not exist")

if os.path.exists(TF_p): TF=pd.read_csv(TF_p, header=None) TF=list(TF.iloc[:,0]) TF_pickle = os.path.join(analysis_dir, sample+ '.TF.pickle') with open(TF_pickle, 'wb') as mypickle: pickle.dump(TF,mypickle) else: print("The file" + TF_p +" does not exist")

print ('Done pickles!')



I hope that helps!