tatami-inc / archive-beachmat

An archived version of the beachmat repository, see https://github.com/LTLA/beachmat for the active version.
8 stars 0 forks source link

BOF application #1

Closed LTLA closed 7 years ago

LTLA commented 7 years ago

Assuming Davide is the BOF session leader, I'll fill everyone else in, in the order of my memory:

Additional collaborators -- First Name, Last Name (Affiliation) Aaron Lun (CRUK Cambridge Institute) Davis McCarthy (EMBL-EBI) Peter Hickey (Johns Hopkins University) Stephanie Hicks (Dana-Farber Cancer Institute, Harvard T.H. Chan School of Public Health) Andrew McDavid (University of Rochester Medical Center)

Topic Infrastructure for efficient storage and processing of large-scale single-cell genomics data

Abstract (250 word maximum) Emerging high-throughput technologies in single-cell transcriptomics have allowed expression profiles to be rapidly generated for each of thousands of cells in a sample. This provides unparalleled resolution to investigate cellular heterogeneity within complex populations, for studying biological processes such as cell fate choice, immune activation and tumour diversity. Rigorous analysis of these data requires the application of appropriate statistical methodologies, many of which are available in packages from the Bioconductor project. However, the computational analyses are often complicated by a number of factors. The first is the suboptimal interoperability between packages that are currently available for single-cell RNA-seq data analysis, as each package defines its own S4 classes to be used for further processing. Another problem is the size of the data sets involved - even a simple experiment contains expression values for each of thousands of genes in each of thousands of cells. Finally, there is little support for multi-omics analyses of single-cell data, relevant to situations where multiple types of data (e.g., transcriptomics, genomics and methylation) are available for each cell. This birds-of-a-feather session will address these issues by proposing a common S4 class for storing single-cell transcriptomics data, which extends existing Bioconductor classes with slots specific to single-cell studies; developing a C++ API for efficient handling of large single-cell data sets, using sparse and disk-backed matrices; and investigating avenues through which multi-omics data can be handled for integrative analyses.

Up to five relevant R / Bioconductor packages scater scran MAST scone monocle

Target audience Developers, users of single-cell genomics packages.

drisso commented 7 years ago

I've made a couple of minor edits to the abstract.

This is the version to be submitted.

Emerging high-throughput technologies in single-cell transcriptomics have allowed expression profiles to be rapidly generated for each of thousands of cells in a sample. This provides unprecedented resolution to investigate cellular heterogeneity within complex populations, for studying biological processes such as cell fate choice, immune activation and tumour diversity. Rigorous analysis of these data requires the application of appropriate statistical methodologies, many of which are available in packages from the Bioconductor project. However, the computational analyses are often complicated by a number of factors. The first is the suboptimal interoperability between packages that are currently available for single-cell RNA-seq data analysis, as each package defines its own S4 classes to be used for further processing. Another problem is the size of the data sets involved - even a simple experiment contains expression values for thousands of genes in thousands of cells. Finally, there is little support for multi-omics analyses of single-cell data, relevant to situations where multiple types of data (e.g., transcriptomics, genomics and methylation) are available for each cell. This birds-of-a-feather session will address these issues by proposing (i) a common S4 class for storing single-cell transcriptomics data, which extends existing Bioconductor classes with slots specific to single-cell studies; (ii) developing a C++ API for efficient handling of large single-cell data sets, using sparse and disk-backed matrices; and (iii) investigating avenues through which multi-omics data can be handled for integrative analyses.

LTLA commented 7 years ago

Looks good to me.

davismcc commented 7 years ago

Looks great. Thanks, Aaron, for getting that going, Davide for edits and submission. Fingers crossed.

amcdavid commented 7 years ago

Thanks Aaron and Davide to taking the lead.

stephaniehicks commented 7 years ago

Looks great! thanks Aaron & Davide!

LTLA commented 7 years ago

Great, closing this.