Open HenriettaHolze opened 2 years ago
I am having the same issue, with the same dataset-
Hello,
I am having an issue subsetting a publicly available loom dataset here - https://console.cloud.google.com/storage/browser/linnarsson-lab-human;tab=objects?authuser=0&pli=1&prefix=&forceOnObjectsSortingFiltering=false
when trying to subset out a certain cluster
library(dplyr)
library(hdf5r)
library(loomR)
loom <- connect(filename = "~/Downloads/adult_human_20221007.loom", mode = "r+", skip.validate = TRUE)
attr.df <- loom$get.attribute.df(MARGIN = 2, col.names = "CellID", row.names = "Gene")
subset(loom, m = attr.df$Clusters == "298", filename = 'CBL.loom', chunk.size = 1000, verbose = T, overwrite = T)
I get the below error
Writing new loom file to CBL.loom Error in H5File.open(filename, mode, file_create_pl, file_access_pl) : HDF5-API Errors: error #000: ../../src/hdf5-1.12.1/src/H5F.c in H5Fcreate(): line 532: unable to create file class: HDF5 major: File accessibility minor: Unable to open file
error #001: ../../src/hdf5-1.12.1/src/H5VLcallback.c in H5VL_file_create(): line 3282: file create failed
class: HDF5
major: Virtual Object Layer
minor: Unable to create file
error #002: ../../src/hdf5-1.12.1/src/H5VLcallback.c in H5VL__file_create(): line 3248: file create failed
class: HDF5
major: Virtual Object Layer
minor: Unable to create file
error #003: ../../src/hdf5-1.12.1/src/H5VLnative_file.c in H5VL__native_file_create(): line 63: unable to create file
class: HDF5
major: File accessibility
minor: Unable to open file
error #004: ../../src/hdf5-1.12.1/src/H5Fint.c in H5F_open(): line 1858: unable to truncate a file which is already open
class: HDF5
major: File ac
I am able to see things in the loom dataset, the output of
attr.df %>% colnames
is
[1] "Age" "CellCycle" "CellID" "Chemistry" "Clusters"
[6] "Donor" "DoubletFinderFlag" "DoubletFinderScore" "MT_ratio" "NGenes"
[11] "ROIGroupCoarse" "ROIGroupFine" "Roi" "SampleID" "Subclusters"
[16] "Tissue" "TotalUMI" "unspliced_ratio"
Which I know to be correct based on viewing with HDFView
Any help is appreciated
@zamlerd I gave up on subsetting the loom object in R and switched to loompy which is also used by the authors of the data.
The loompy tutorial describes subsetting with loompy.new() and scan() but that threw me an error.
A simple downsampling worked for me this way:
import loompy
import numpy as np
input_file = "adult_human_20221007.loom"
out_file = "adult_human_20221007_downsampled.loom"
with loompy.connect(input_file) as ds:
# getting 50k random indices
ind_oi = np.random.choice(list(range(ds.shape[1])), 50000, replace=False)
ind_oi.sort()
# initiate the output file with 2 cells
ds_subset = ds[:, ind_oi[:2]]
loompy.create(filename=out_file, layers=ds[:, ind_oi[:2]], file_attrs=ds.attrs, col_attrs=ds.ca[ind_oi[:2]], row_attrs=ds.ra)
ind_oi = ind_oi[2:]
# connect to the output file
with loompy.connect(out_file) as dsout:
# subset the input file in batches and write the subset of cells to the output file
for (ix, selection, view) in ds.scan(items=ind_oi, axis=1, batch_size=50000):
dsout.add_columns(view.layers, col_attrs=view.ca, row_attrs=view.ra)
I guess your subsampling would work like this
ind_oi = np.where(ds.ca["Clusters"] == "298")[0]
@HenriettaHolze Thank you so much!
I was attempting the same and have been banging my head against a wall
for some reason when trying you code I get the error below
ind_oi = np.where(ds.ca["Clusters"] == "298")[0]
<loompy.attribute_manager.AttributeManager object at 0x177d2c4c0>
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [24], in <cell line: 3>()
9 ds_subset = ds[:, ind_oi[:2]]
10 print(ds.ca)
---> 11 loompy.create(filename=out_file, layers=ds[:, ind_oi[:2]], file_attrs=ds.attrs, col_attrs=ds.ca[ind_oi[:2]], row_attrs=ds.ra)
13 ind_oi = ind_oi[2:]
15 # connect to the output file
File ~/opt/anaconda3/lib/python3.9/site-packages/loompy/attribute_manager.py:83, in AttributeManager.__getitem__(self, thing)
81 am = AttributeManager(None, axis=self.axis)
82 for key, val in self.items():
---> 83 am[key] = val[thing]
84 return am
85 elif type(thing) is tuple:
86 # A tuple of strings giving alternative names for attributes
TypeError: 'NoneType' object is not subscriptable
Any further hints?
I also tried it with just the random sampling as you did and got the same error
I'm not entirely sure what happened there. I had issues with the encoding and had to run these lines in the terminal before starting python or running the script.
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
If that does not solve the error, maybe take it to the loompy repo https://github.com/linnarsson-lab/loompy/issues
Thank you so much @HenriettaHolze I am relying on you for other peoples packages haha so sorry for that-
I tried running those lines and rebooting and have the same issue-
I will port over to them,
Thanks again!
Hi,
I would like to work with the recently published adult human brain atlas which contains 3M cells https://www.biorxiv.org/content/10.1101/2022.10.12.511898v1.full .
When I connect to the loom file and want to convert it to a Seurat object using SeuratDisk's
as.Seurat()
function I run into memory issues (1.5T required).Do you have tips how to work with such huge datasets with Seurat?
I tried to use the subset function from loomR to get a random subset of cells and save it as a new loom file to then be able to convert that one to Seurat.
I got following error:
The loom object looks as follows:
I use SeuratDisk v0.0.0.9019 and loomR v0.2.1.9000.