jmbreda / Sanity

Filtering of Poison noise on a single-cell RNA-seq UMI count matrix
GNU General Public License v3.0
69 stars 11 forks source link

Sanity

Sampling Noise based Inference of Transcription ActivitY : Filtering of Poison noise on a single-cell RNA-seq UMI count matrix

Single-cell RNA sequencing normalization algorithm presented in the publication Bayesian inference of gene expression states from single-cell RNA-seq data - J Breda, M Zavolan, E van Nimwegen - Nature Biotechnology, 2021.

Sanity infers the log expression levels xgc of gene g in cell c by filtering out the Poisson noise on the UMI count matrix ngc of gene g in cell c.

Reproducibility

The raw UMI count and normalized datasets mentionned in benchmarking in the associated publication are available on DO I. Files are named [dataset name]_UMI_counts.txt.gz and [dataset name]_[tool name]_normalization.txt.gz.

The scripts used for running the bechmarked normalization methods and for making the figures of the preprint are in the reproducibility folder.

Input

GeneID Cell 1 Cell 2 Cell 3 ...
Gene 1 1.0 2.0 0.0
Gene 2 6.0 3.0 1.0
...

Extended output (optional)

Usage

  ./Sanity <option(s)> SOURCES
  Options:
    -h,--help       Show this help message
    -v,--version        Show the current version
    -f,--file       Specify the input transcript count text file (.mtx for Matrix Market File Format)
    -mtx_genes,--mtx_gene_name_file Specify the gene name text file (only needed if .mtx input file)
        -mtx_cells,--mtx_cell_name_file Specify the cell name text file (only needed if .mtx input file)
    -d,--destination    Specify the destination path (default: pwd)
    -n,--n_threads      Specify the number of threads to be used (default: 4)
    -e,--extended_output    Option to print extended output (default: false, choice: false,0,true,1)
    -vmin,--variance_min    Minimal value of variance in log transcription quotient (default: 0.001)
    -vmax,--variance_max    Maximal value of variance in log transcription quotient (default: 50)
    -nbin,--number_of_bins  Number of bins for the variance in log transcription quotient  (default: 160)
    -no_norm,--no_cell_size_normalization   Option to skip cell size normalization (default: false, choice: false,0,true,1)

Installation

Sanity_distance

Compute cell-cell distances from Sanity output files. Needs extended outputs of Sanity (-e 1 option).

Input