linnarsson-lab / loompy

Python implementation of the Loom file format - http://loompy.org
BSD 2-Clause "Simplified" License
139 stars 37 forks source link

Could `loompy` use xarray/netCDF under the hood? #3

Closed olgabot closed 6 years ago

olgabot commented 6 years ago

NetCDF is a flavor of HDF5 that is used in the geological/astro communities and is implemented in Python with the xarray library and in R with the RNetCDF package, so you wouldn't lose the R people.

I've used it here for single cell datasets and while it has a bit of a learning curve, it's still useful and very fast for large datasets.

I ask because it looks like as of now, loom doesn't support selecting by gene id or cell id, while xarray has implemented label-based indexing already. And this way you wouldn't have to implement it!

slinnarsson commented 6 years ago

You can definitely select by gene (or any attribute). Use numpy fancy indexing with a bool array:

ds = loompy.connect(...) x = ds[ds.Gene == "Actb"]

Assuming you have a column attribute "Gene".

Sten

-- Sten Linnarsson, PhD Professor of Molecular Systems Biology Karolinska Institutet Unit of Molecular Neurobiology Department of Medical Biochemistry and Biophysics Scheeles väg 1, 171 77 Stockholm, Sweden<x-apple-data-detectors://1/0> +46 8 52 48 75 77<tel:+46%208%2052%2048%2075%2077> (office) +46 70 399 32 06<tel:+46%2070%20399%2032%2006> (mobile)

21 okt. 2017 kl. 00:20 skrev Olga Botvinnik notifications@github.com<mailto:notifications@github.com>:

NetCDF is a flavor of HDF5 that is used in the geological/astro communities and is implemented in Python with the xarrayhttp://xarray.pydata.org/en/stable/ library and in R with the RNetCDFhttps://cran.r-project.org/web/packages/RNetCDF/index.html package, so you wouldn't lose the R people.

I've used it herehttps://github.com/singlecell-batches/data/blob/master/01_notebooks/004_make_10percent_subset_fibroblasts.ipynb for single cell datasets and while it has a bit of a learning curve, it's still useful and very fast for large datasets.

I ask because it looks like as of now, loom doesn't support selecting by gene id or cell id, while xarray has implemented label-based indexinghttp://xarray.pydata.org/en/stable/indexing.html already.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/linnarsson-lab/loompy/issues/3, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKKag8MxvLyTSNsk4_AEfAYBlpw986aqks5suRy6gaJpZM4QBS3w.

olgabot commented 6 years ago

ah okay great, I didn't see this in the documentation. Thank you!


Olga Botvinnik, PhD olgabotvinnik.com http://www.olgabotvinnik.com

2017-10-21 0:43 GMT-07:00 Sten Linnarsson notifications@github.com:

You can definitely select by gene (or any attribute). Use numpy fancy indexing with a bool array:

ds = loompy.connect(...) x = ds[ds.Gene == "Actb"]

Assuming you have a column attribute "Gene".

Sten

-- Sten Linnarsson, PhD Professor of Molecular Systems Biology Karolinska Institutet Unit of Molecular Neurobiology Department of Medical Biochemistry and Biophysics Scheeles väg 1 https://maps.google.com/?q=Scheeles+v%C3%A4g+1&entry=gmail&source=g, 171 77 Stockholm, Sweden<x-apple-data-detectors://1/0> +46 8 52 48 75 77<tel:+46%208%2052%2048%2075%2077> (office) +46 70 399 32 06 <+46%2070%20399%2032%2006><tel:+46%2070%20399%2032%2006> (mobile)

21 okt. 2017 kl. 00:20 skrev Olga Botvinnik <notifications@github.com< mailto:notifications@github.com>>:

NetCDF is a flavor of HDF5 that is used in the geological/astro communities and is implemented in Python with the xarray< http://xarray.pydata.org/en/stable/> library and in R with the RNetCDF< https://cran.r-project.org/web/packages/RNetCDF/index.html> package, so you wouldn't lose the R people.

I've used it herehttps://github.com/singlecell-batches/data/blob/ master/01_notebooks/004_make_10percent_subset_fibroblasts.ipynb for single cell datasets and while it has a bit of a learning curve, it's still useful and very fast for large datasets.

I ask because it looks like as of now, loom doesn't support selecting by gene id or cell id, while xarray has implemented label-based indexing< http://xarray.pydata.org/en/stable/indexing.html> already.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ linnarsson-lab/loompy/issues/3, or mute the threadhttps://github.com/ notifications/unsubscribe-auth/AKKag8MxvLyTSNsk4_ AEfAYBlpw986aqks5suRy6gaJpZM4QBS3w.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/linnarsson-lab/loompy/issues/3#issuecomment-338372863, or mute the thread https://github.com/notifications/unsubscribe-auth/AAxNcJudncnXo7T4-TCVC93vM8IqL8GWks5suaCwgaJpZM4QBS3w .