hbc / bcbioRNASeq

R package for bcbio RNA-seq analysis.
https://bioinformatics.sph.harvard.edu/bcbioRNASeq
GNU Affero General Public License v3.0
58 stars 21 forks source link

bcbio template? #139

Closed kokyriakidis closed 4 years ago

kokyriakidis commented 4 years ago

Hi! Can someone share a full template you use in order to get the full of the data? Do you use salmon instead of kallisto for a reason?

mjsteinbaugh commented 4 years ago

Great question. I'll work on adding this to the bcbioRNASeq documentation. Here are some example YAML configs: https://github.com/bcbio/bcbio-nextgen/tree/master/config/templates

You can currently run either HISAT2 or STAR for alignment, but can call both kallisto and salmon in a single bcbio-nextgen run.

Here's an example config for a stranded Homo sapiens run against hg38:

---
details:
  - analysis: RNA-seq
    genome_build: hg38
    algorithm:
      aligner: hisat2  # or star
      expression_caller: [salmon, kallisto]
      strandedness: firststrand
      trim_reads: False
upload:
  dir: ../final

Trimming isn't necessary and it is now disabled by default in bcbio.

If you're not sure about whether the library preparation was stranded, then run it without the strandedness set and it will default to unstranded. The salmon documentation has some great details on library types: https://salmon.readthedocs.io/en/latest/library_type.html

Best, Mike

mjsteinbaugh commented 4 years ago

Also, a fast mode is supported, where bcbio will run salmon without any alignment or QC steps. To enable this (not recommended by default), use: analysis: fastrna-seq.

kokyriakidis commented 4 years ago

Hi @mjsteinbaugh !

1) bcbio documentation states that I can set these options. Do I need to?

bcbiornaseq A dictionary of key-value pairs to be passed as options to bcbioRNAseq. Currently supports organism as a key and takes the latin name of the genome used (mus musculus, homo sapiens, etc) and interesting_groups which will be used to color quality control plots.:

bcbiornaseq:
  organism: homo sapiens
  interesting_groups: [treatment, genotype, etc, etc]

You will need to also turn on bcbiornaseq by turning it on via tools_on: [bcbiornaseq]

2) Do you use somehow DEXSeq or DRIMSeq in the workflow?

3) Let's say I have 60 samples. 30 control and 30 with a feuture. Do I create 1 config file including all 60 samples and changing the batch name? Can you show me a cofing with 2 wildtype and 2 KO?

mjsteinbaugh commented 4 years ago

Here's some more detail on what's going on with this YAML:

bcbiornaseq:
  organism: homo sapiens
  interesting_groups: [treatment, genotype, etc, etc]

When this is set, bcbio-nextgen will load the bcbioRNASeq R package internally, which is managed by bioconda (see r-bcbiornaseq recipe for details). It will output an R object named "bcb" in the final bcbio output. We're still testing this functionality and adding supported keys, so consider it experimental at the moment.

What I typically do is run bcbio-nextgen without the bcbiornaseq: YAML and then load the data up manually in R instead. DEXSeq and/or DRIMSeq aren't used in the current workflow.

mjsteinbaugh commented 4 years ago

@kokyriakidis I'm closing this issue due to inactivity. Feel free to comment if you have any follow-up questions.