Closed nick-youngblut closed 7 months ago
The pipeline functions now take arguments minimap2
and k8
, the paftools.js
script is included in FLAMES
. Hopefully this will make it easier to run on Ubuntu.
If you would like to install the latest commit you will need to either have bioconductor in development mode (which requires R 4.4) or apply the following patch with git apply patch.txt
to remove changes specific to next biocondcutor release:
diff --git a/DESCRIPTION b/DESCRIPTION
index fcb769f..cfc1578 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -39,7 +39,6 @@ Imports:
DropletUtils,
GenomicRanges,
GenomicFeatures,
- txdbmaker,
GenomicAlignments,
GenomeInfoDb,
ggplot2,
diff --git a/NAMESPACE b/NAMESPACE
index 561d2c6..2fc6eec 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -46,6 +46,7 @@ importFrom(GenomeInfoDb,seqlengths)
importFrom(GenomicAlignments,readGAlignments)
importFrom(GenomicAlignments,seqnames)
importFrom(GenomicFeatures,extractTranscriptSeqs)
+importFrom(GenomicFeatures,makeTxDbFromGFF)
importFrom(GenomicFeatures,transcripts)
importFrom(GenomicRanges,GRanges)
importFrom(GenomicRanges,GRangesList)
@@ -167,8 +168,6 @@ importFrom(tidyr,as_tibble)
importFrom(tidyr,gather)
importFrom(tidyr,pivot_longer)
importFrom(tidyr,pivot_wider)
-importFrom(txdbmaker,makeTxDbFromGFF)
-importFrom(txdbmaker,makeTxDbFromGRanges)
importFrom(utils,file_test)
importFrom(utils,modifyList)
importFrom(utils,read.csv)
diff --git a/R/find_isoform.R b/R/find_isoform.R
index bc5db79..9f7f877 100644
--- a/R/find_isoform.R
+++ b/R/find_isoform.R
@@ -141,8 +141,7 @@ find_isoform_flames <- function(annotation, genome_fa, genome_bam, outdir, confi
#' @return Path to the outputted transcriptome assembly
#'
#' @importFrom Biostrings readDNAStringSet writeXStringSet
-#' @importFrom GenomicFeatures extractTranscriptSeqs
-#' @importFrom txdbmaker makeTxDbFromGFF
+#' @importFrom GenomicFeatures extractTranscriptSeqs makeTxDbFromGFF
#' @importFrom Rsamtools indexFa
#' @importFrom utils write.table
#'
@@ -172,7 +171,7 @@ annotation_to_fasta <- function(isoform_annotation, genome_fa, outdir, extract_f
dna_string_set <- Biostrings::readDNAStringSet(genome_fa)
names(dna_string_set) <- gsub(" .*$", "", names(dna_string_set))
- txdb <- txdbmaker::makeTxDbFromGFF(isoform_annotation)
+ txdb <- GenomicFeatures::makeTxDbFromGFF(isoform_annotation)
if (missing(extract_fn)) {
tr_string_set <- GenomicFeatures::extractTranscriptSeqs(dna_string_set, txdb,
use.names = TRUE)
diff --git a/R/model_decay.R b/R/model_decay.R
index 075cee0..d797339 100644
--- a/R/model_decay.R
+++ b/R/model_decay.R
@@ -4,7 +4,6 @@
#' that only differ by the 5' / 3' end. This could be useful for plotting average
#' coverage plots.
#'
-#' @importFrom txdbmaker makeTxDbFromGFF makeTxDbFromGRanges
#' @importFrom rtracklayer import
#' @importFrom S4Vectors split
#' @importFrom GenomicRanges strand
@@ -27,11 +26,11 @@
filter_annotation <- function(annotation, keep = "tss_differ") {
if (is.character(annotation)) {
annotation <- annotation |>
- txdbmaker::makeTxDbFromGFF() |>
+ GenomicFeatures::makeTxDbFromGFF() |>
GenomicFeatures::transcripts()
} else {
annotation <- annotation |>
- txdbmaker::makeTxDbFromGRanges() |>
+ GenomicFeatures::makeTxDbFromGRanges() |>
GenomicFeatures::transcripts()
}
@@ -55,7 +54,7 @@ filter_annotation <- function(annotation, keep = "tss_differ") {
#' @description Plot the average read coverages for each length bin or a
#' perticular isoform
#'
-#' @importFrom GenomicFeatures transcripts
+#' @importFrom GenomicFeatures makeTxDbFromGFF transcripts
#' @importFrom GenomicAlignments readGAlignments seqnames
#' @importFrom GenomicRanges width strand granges coverage
#' @importFrom Rsamtools ScanBamParam
Thanks @ChangqingW for making the updates!
bioconductor in development mode (which requires R 4.4)
This is a good example of how Bioconductor just adds unneeded complexity to package management in R.
I'm using R 4.3.1, and I have no plans on recreating my Docker environment with R 4.4 (the build for rocker/rstudio + Seurat + FLAMES takes nearly an hour). When do you plan on submitting a new release to bioconductor?
Bioconductor's next release is scheduled on May 1st: https://bioconductor.org/developers/release-schedule/
Thanks for letting me know. I'm guessing that I'll have to update R just to use the updated version of bioconductor.
I tried to just use BiocManager::install("mritchielab/FLAMES")
with R 4.3.1, which resulted in the failed install of txdbmaker
, which is only available for Bioconductor 3.19. I'm guessing this is why you state that R 4.4 is needed.
The convoluted dependency trees and releases separate to CRAN can really make bioconductor a pain (what's wrong with good-old CRAN?).
You can try cloning to a local folder, apply the diff I posted to remove txdbmaker
stuff and install from the local folder.
git clone https://github.com/mritchielab/FLAMES.git && cd FLAMES
(save the patch file somewhere)
git apply path/to/patch/file
Then, in R: remotes::install_local("path/to/cloned/FLAMES", force = T)
Thanks @ChangqingW for the suggestion! remotes
doesn't always work well for bioconductor packages, but I'll give it a try.
Does txdbmaker
really have to be a required dependency? It appears to only be available for only available for Bioconductor 3.19, but that version hasn't even been fully released yet. The same for R 4.4. No everyone is on the bleeding edge of R & Bioconductor. For instance rocker doesn't even have any docker containers for R 4.4.
Would it be possible to remove txdbmaker
from Imports:
? Note: including txdbmaker
in Imports:
negates the inclusion of txdbmaker
in Suggests:
.
You can indeed, that is what the git patch file is doing. We can't do it for this branch as it is the devel branch, which is built and checked with Bioc devel. As for release branches, we cannot introduce API changes in minor version bumps as per bioc guidelines, and the next major version bump would be May 1st according to Bioc's schedule, where the devel updates will be merged.
You can also try my own fork BiocManager::install("ChangqingW/FLAMES-R", ref = "devel_R_4_3", force = T)
, which has the patch applied and is not synced with any Bioc branches.
Patching and running remotes::install_local("path/to/cloned/FLAMES", force = T)
worked. Thanks 👍
I've switched to different computing infrastructure in which software much be installed in conda environments.
Installing FLAMES via:
mamba install r-rnev r-biomanager
R
> BiocManager::install("FLAMES")
git clone https://github.com/mritchielab/FLAMES.git
git apply patch.txt
R
> remotes::install_local("../FLAMES", force = T)
Results in the following error:
── R CMD build ────────────────────────────────────────────────────────────────────────────────────────
✔ checking for file ‘/tmp/Rtmp3pht5z/file15a958461b6c75/FLAMES/DESCRIPTION’ ...
─ preparing ‘FLAMES’:
✔ checking DESCRIPTION meta-information ...
─ cleaning src
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘FLAMES_1.9.2.tar.gz’
ERROR: dependency ‘scater’ is not available for package ‘FLAMES’
Even with the patch, scater
is still a dependency:
Imports:
basilisk,
bambu,
Biostrings,
BiocGenerics,
circlize,
ComplexHeatmap,
cowplot,
dplyr,
DropletUtils,
GenomicRanges,
GenomicFeatures,
GenomicAlignments,
GenomeInfoDb,
ggplot2,
ggbio,
grid,
gridExtra,
igraph,
jsonlite,
magrittr,
Matrix,
parallel,
reticulate,
Rsamtools,
rtracklayer,
RColorBrewer,
SingleCellExperiment,
SummarizedExperiment,
scater,
In another attempt with less usage of conda:
mamba create -n flames r-base
R
> install.packages('remotes')
> remotes::install_local("FLAMES", force = T)
The error:
ERROR: dependencies ‘rtracklayer’, ‘biomaRt’ are not available for package ‘GenomicFeatures’
* removing ‘/home/nickyoungblut/miniforge3/envs/flames/lib/R/library/GenomicFeatures’
ERROR: dependencies ‘ggplot2’, ‘patchwork’ are not available for package ‘ggstats’
* removing ‘/home/nickyoungblut/miniforge3/envs/flames/lib/R/library/ggstats’
ERROR: dependencies ‘GenomicFeatures’, ‘rtracklayer’ are not available for package ‘ensembldb’
* removing ‘/home/nickyoungblut/miniforge3/envs/flames/lib/R/library/ensembldb’
ERROR: dependencies ‘SummarizedExperiment’, ‘rtracklayer’, ‘BSgenome’, ‘GenomicFeatures’ are not available for package ‘VariantAnnotation’
* removing ‘/home/nickyoungblut/miniforge3/envs/flames/lib/R/library/VariantAnnotation’
ERROR: dependencies ‘ggplot2’, ‘ggstats’ are not available for package ‘GGally’
* removing ‘/home/nickyoungblut/miniforge3/envs/flames/lib/R/library/GGally’
ERROR: dependency ‘GenomicFeatures’ is not available for package ‘OrganismDbi’
* removing ‘/home/nickyoungblut/miniforge3/envs/flames/lib/R/library/OrganismDbi’
ERROR: dependencies ‘SummarizedExperiment’, ‘BSgenome’, ‘GenomicAlignments’, ‘GenomicFeatures’, ‘xgboost’ are not available for package ‘bambu’
* removing ‘/home/nickyoungblut/miniforge3/envs/flames/lib/R/library/bambu’
ERROR: dependencies ‘Hmisc’, ‘SummarizedExperiment’, ‘GenomicAlignments’, ‘GenomicFeatures’, ‘VariantAnnotation’, ‘ensembldb’ are not available for package ‘biovizBase’
* removing ‘/home/nickyoungblut/miniforge3/envs/flames/lib/R/library/biovizBase’
ERROR: dependencies ‘ggplot2’, ‘Hmisc’, ‘biovizBase’, ‘SummarizedExperiment’, ‘GenomicAlignments’, ‘BSgenome’, ‘VariantAnnotation’, ‘rtracklayer’, ‘GenomicFeatures’, ‘OrganismDbi’, ‘GGally’, ‘ensembldb’ are not available for package ‘ggbio’
* removing ‘/home/nickyoungblut/miniforge3/envs/flames/lib/R/library/ggbio’
The downloaded source packages are in
‘/tmp/RtmpsIf7lY/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
── R CMD build ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ checking for file ‘/tmp/RtmpsIf7lY/file15bbd932199be9/FLAMES/DESCRIPTION’ ...
─ preparing ‘FLAMES’:
✔ checking DESCRIPTION meta-information ...
─ cleaning src
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘FLAMES_1.9.2.tar.gz’
ERROR: dependencies ‘basilisk’, ‘bambu’, ‘cowplot’, ‘DropletUtils’, ‘GenomicFeatures’, ‘GenomicAlignments’, ‘ggplot2’, ‘ggbio’, ‘igraph’, ‘Matrix’, ‘reticulate’, ‘rtracklayer’, ‘SingleCellExperiment’, ‘SummarizedExperiment’, ‘scater’, ‘scuttle’, ‘scran’, ‘MultiAssayExperiment’ are not available for package ‘FLAMES’
* removing ‘/home/nickyoungblut/miniforge3/envs/flames/lib/R/library/FLAMES’
There were 50 or more warnings (use warnings() to see the first 50)
Given that the are nearly 250 total R packages as dependencies for FLAMES, I can see why it can be tricky to install all of them without any errors.
I didn't see that there is a FLAMES bioconda recipe, so the install become quite simple and much faster:
mamba create -n flames bioconductor-flames minimap2 k8
Even with the patch, I'm getting:
07:09:56 PM Thu Jun 06 2024 Start running
Running BLAZE to generate barcode list from long reads...
$`output-prefix`
[1] "/large_experiments/multiomics/SspArc0008_10x_cDNA_longRead//FLAMES/sc_3end//"
$`output-fastq`
[1] "matched_reads.fastq"
$threads
[1] 8
$`max-edit-distance`
[1] 2
$overwrite
[1] TRUE
+ [/home/nickyoungblut/.cache/R/basilisk/1.14.1/0/bin/conda](http://localhost:9955/home/nickyoungblut/.cache/R/basilisk/1.14.1/0/bin/conda) create --yes --prefix [/home/nickyoungblut/.cache/R/basilisk/1.14.1/FLAMES/1.9.2/flames_env](http://localhost:9955/home/nickyoungblut/.cache/R/basilisk/1.14.1/FLAMES/1.9.2/flames_env) 'python=3.10' --quiet -c conda-forge -c bioconda -c defaults
+ [/home/nickyoungblut/.cache/R/basilisk/1.14.1/0/bin/conda](http://localhost:9955/home/nickyoungblut/.cache/R/basilisk/1.14.1/0/bin/conda) install --yes --prefix [/home/nickyoungblut/.cache/R/basilisk/1.14.1/FLAMES/1.9.2/flames_env](http://localhost:9955/home/nickyoungblut/.cache/R/basilisk/1.14.1/FLAMES/1.9.2/flames_env) 'python=3.10' -c conda-forge -c bioconda -c defaults
+ [/home/nickyoungblut/.cache/R/basilisk/1.14.1/0/bin/conda](http://localhost:9955/home/nickyoungblut/.cache/R/basilisk/1.14.1/0/bin/conda) install --yes --prefix [/home/nickyoungblut/.cache/R/basilisk/1.14.1/FLAMES/1.9.2/flames_env](http://localhost:9955/home/nickyoungblut/.cache/R/basilisk/1.14.1/FLAMES/1.9.2/flames_env) -c conda-forge -c bioconda -c defaults 'python=3.10' 'python=3.10' 'numpy=1.25.0' 'scipy=1.11.1' 'pysam=0.21.0' 'cutadapt=4.4' 'tqdm=4.64.1' 'pandas=1.3.5'
Running BLAZE...
Argument: --expect-cells 8000 --overwrite --minimal_stdout --output-prefix /large_experiments/multiomics/SspArc0008_10x_cDNA_longRead//FLAMES/sc_3end// --output-fastq matched_reads.fastq --threads 8 --max-edit-distance 2 /large_experiments/multiomics/SspArc0008_10x_cDNA_longRead//ont-proc_output/final/fastq_test_10k
07:15:12 PM Thu Jun 06 2024 Demultiplex done
Running FLAMES pipeline...
#### Input parameters:
{
"pipeline_parameters": {
"seed": [2022],
"threads": [8],
"do_barcode_demultiplex": [true],
"do_gene_quantification": [true],
"do_genome_alignment": [true],
"do_isoform_identification": [true],
"bambu_isoform_identification": [false],
"multithread_isoform_identification": [true],
"do_read_realignment": [true],
"do_transcript_quantification": [true]
},
"barcode_parameters": {
"max_bc_editdistance": [2],
"max_flank_editdistance": [8],
"pattern": {
"primer": ["CTACACGACGCTCTTCCGATCT"],
"BC": ["NNNNNNNNNNNNNNNN"],
"UMI": ["NNNNNNNNNNNN"],
"polyT": ["TTTTTTTTT"]
},
"TSO_seq": ["CCCATGTACTCTGCGTTGATACCACTGCTT"],
"TSO_prime": [3],
"full_length_only": [false]
},
"isoform_parameters": {
"generate_raw_isoform": [false],
"max_dist": [10],
"max_ts_dist": [100],
"max_splice_match_dist": [10],
"min_fl_exon_len": [40],
"max_site_per_splice": [3],
"min_sup_cnt": [5],
"min_cnt_pct": [0.001],
"min_sup_pct": [0.2],
"bambu_trust_reference": [true],
"strand_specific": [0],
"remove_incomp_reads": [4],
"downsample_ratio": [1]
},
"alignment_parameters": {
"use_junctions": [true],
"no_flank": [false]
},
"realign_parameters": {
"use_annotation": [true]
},
"transcript_counting": {
"min_tr_coverage": [0.4],
"min_read_coverage": [0.4]
}
}
gene annotation: /large_experiments/multiomics/references/FLAMES/refdata-gex-GRCm39-2024-A/genes/genes.gtf
genome fasta: /large_experiments/multiomics/references/FLAMES/refdata-gex-GRCm39-2024-A/fasta/genome.fa
input fastq: /large_experiments/multiomics/SspArc0008_10x_cDNA_longRead//FLAMES/sc_3end//matched_reads.fastq
output directory: /large_experiments/multiomics/SspArc0008_10x_cDNA_longRead//FLAMES/sc_3end/
minimap2 path:
k8 path:
#### Aligning reads to genome using minimap2
07:15:12 PM Thu Jun 06 2024 minimap2_align
Error in minimap2_align(config, genome_fa, infq, annotation, outdir, minimap2, : k8 not found, please make sure it is installed and provide its path as the k8 argument
Traceback:
1. sc_long_pipeline(fastq = fastq_dir, annotation = ref_annot_file,
. genome_fa = ref_genome_file, outdir = outdir, config_file = config_file,
. expect_cell_number = 8000)
2. minimap2_align(config, genome_fa, infq, annotation, outdir, minimap2,
. k8, prefix = NULL, threads = config$pipeline_parameters$threads)
3. stop("k8 not found, please make sure it is installed and provide its path as the k8 argument")
k8
is in my PATH for my conda env. See https://github.com/mritchielab/FLAMES/issues/34#issuecomment-2153219282 for how I'm currently conducting the install.
If minimap2 is installed in a linux OS via
apt-get
, the executable is located at/usr/bin/minimap2
, and so "k8 and/or paftools.js" are not located in that directory.It appears that the linux package does not include paftools or k8:
So, it would be helpful to warn users that minimap2 installed via
apt-get
does not work with FLAMES -- at least unless K8 or paftools.js is installed separately.I'm using FLAMES 1.8.0.