guigolab / sqtlseeker2-nf

An automated sQTL mapping pipeline using Nextflow
GNU General Public License v3.0
6 stars 4 forks source link

sqtlseeker2-nf

nextflow CI-checks

A pipeline for splicing quantitative trait loci (sQTL) mapping.

The pipeline performs the following analysis steps:

For details on each step, please read sQTLseekeR2 documentation.

The pipeline uses Nextflow as the execution backend. Please check Nextflow documentation for more information.

Requirements

Quickstart (~2 min)

  1. Install Nextflow:

    curl -fsSL get.nextflow.io | bash
  2. Make a test run:

    ./nextflow run guigolab/sqtlseeker2-nf -with-docker

    Note: set -with-singularity to use Singularity instead of Docker.

    Important: Since release 22.12.0-edge, DSL1 is not further supported in Nextflow. Until sqtlseeker2-nf is migrated to DSL2, the pipeline should be run using an older Nextflow release. This can be done using NXF_VER before Nextflow commands, e.g. NXF_VER=22.04.0 ./nextflow run guigolab/sqtlseeker2-nf -with-docker.

Pipeline usage

Launching the pipeline with the --help parameter shows the help message:

nextflow run sqtlseeker2-nf --help
N E X T F L O W  ~  version 0.27.2
Launching `sqtlseeker2.nf` [admiring_lichterman] - revision: 28c86caf1c

sqtlseeker2-nf ~ A pipeline for splicing QTL mapping
----------------------------------------------------
Run sQTLseekeR2 on a set of data.

Usage: 
    sqtlseeker2-nf [options]

Options:
--genotype GENOTYPE_FILE    the genotype file
--trexp EXPRESSION_FILE     the transcript expression file
--metadata METADATA_FILE    the metadata file
--genes GENES_FILE          the gene location file
--dir DIRECTORY             the output directory
--mode MODE                 the run mode: nominal or permuted (default: nominal)
--win WINDOW                the cis window in bp (default: 5000)
--covariates COVARIATES     include covariates in the model (default: false)
--fdr FDR                   false discovery rate level (default: 0.05)
--min_md MIN_MD             minimum effect size reported (default: 0.05)
--svqtl SVQTLS              report svQTLs (default: false)

Additional parameters for mode = nominal:
--ld LD                     threshold for LD-based variant clustering (default: 0, no clustering)
--kn KN                     number of genes per batch in nominal pass (default: 10)

Additional parameters for mode = permuted:
--kp KP                     number of genes per batch in permuted pass (default: 10)
--max_perm MAX_PERM         maximum number of permutations (default: 1000)

Input files and format

sqtlseeker2-nf takes as input files the following:

Example data is available for the test run.

Pipeline results

sQTL mapping results are saved into the folder specified with the --dir parameter. By default it is the result folder within the current working directory.

Output files are organinzed into subfolders corresponding to the different groups specified in the metadata file:

result
└── groups
    ├── group1                            
    │   ├── all-tests.nominal.tsv          
    │   ├── all-tests.permuted.tsv         
    │   ├── sqtls-${level}fdr.nominal.tsv      
    │   └── sqtls-${level}fdr.permuted.tsv     
    ├── group2
   ...

Note: if only a nominal pass was run, files *.permuted.tsv will not be present.

Output files contain the following information:

all-tests.nominal.tsv

if --svqtl true

if --ld ${r2}

sqtls-${level}fdr.nominal.tsv (in addition to the previous)

all-tests.permuted.tsv

sqtls-${level}fdr.nominal.tsv (in addition to the previous)

Cite sqtlseeker2-nf

If you find sqtlseeker2-nf useful in your research please cite the related publication:

Garrido-Martín, D., Borsari, B., Calvo, M., Reverter, F., Guigó, R. Identification and analysis of splicing quantitative trait loci across multiple tissues in the human genome. Nat Commun 12, 727 (2021). https://doi.org/10.1038/s41467-020-20578-2