Closed brianherb closed 3 years ago
I think this is more a question for the scanpy people, since you're using the to_loompy() method. You can use loompy directly to set any attribute, e.g. something like:
with open(filename) as ds:
old_ids = ds.ca.CellID[:]
# Take the first 20 characters of each filename as the cellid
new_ids = np.array([s[:20] for s in old_ids], dtype=object)
ds.ca.CellID = new_ids
Thank you! That worked quite well. Here is my modified code which uses the lookUp table (I like the trick with the first 20 char, but not always the case, unfortunately):
for lf in loomFiles:
ds = loompy.connect(lf)
old_ids = ds.ca.CellID[:]
b = lookUp['file'].tolist()
ind=[ b.index(x) if x in b else None for x in old_ids]
new_ids = lookUp['cell_id'][ind].tolist()
ds.ca.cell_names = new_ids
ds.ca.CellID = new_ids
ds.close()
Hi all,
I'm systematically reprocessing a large amount of smart seq data and the pipeline I'm using processes multiple individual fastq files (one per cell) and produces a single loom file containing counts of all cells. Unfortunately, the way that I have to run this pipeline I have to use the filenames as cell id's instead of the shorter cell id I actually want. What I would like to do is swap the file name for a cell id of my choosing. Is there a straightforward way to do this? I attempted a solution with the loompy module in python, but I'm worried that now there are no cell id's in the new loom file.
for example - what I see in the original .obs slot:
and what I want to change it to:
but when I try to rebuild the loom file, I lose the cell id in the index:
Here is what I tried so far:
data = scanpy.read_loom(filename=lf) sampleName = re.sub('.loom','',lf) Obs = data.obs Gene = data.var Mat = data.layers['intron_counts'].toarray().transpose() ## only one layer in this dataset
Obs2 = copy.deepcopy(Obs) a = Obs['input_id'].tolist() # this is file names b = lookUp['file'].tolist() # lookUp object contains file name to cell id mapping ind=[ b.index(x) if x in b else None for x in a ] ## index file names
newCell = lookUp['cell_id'][ind].tolist() oldCell = Obs['input_id'].tolist() cellDict = {oldCell[i]: newCell[i] for i in range(len(oldCell))} ## swap in new cell ids
Obs2.index = newCell Obs2.index.name = 'CellID'
Obs2["cell_names"].replace(cellDict, inplace=True)
convert to dictionary for loompy.create - am I loosing cell id here? How should I construct this dict?
L1 = Obs2.columns.tolist() L2 = Obs2.transpose().to_numpy() Obsd = {k:v for k,v in zip(L1,L2)}
get gene info back in loom object - also, am I dropping gene names here?
L1 = Gene.columns.tolist() L2 = Gene.transpose().to_numpy() Gened = {k:v for k,v in zip(L1,L2)}
loompy.create('test.loom',{'':Mat,'intron_counts':Mat}, Gened, Obsd)