shortRNAhub / shortRNA

short RNA-seq analysis package
GNU General Public License v3.0
1 stars 2 forks source link

Perform alignment from within R #4

Closed plger closed 1 year ago

plger commented 6 years ago

Use the Rbowtie (and perhaps QuasR) to reproduce our bowtie alignment from within R. This should eventually replace the current content of the alignment.R

dktanwar commented 6 years ago

QuasR can perform the first 2 steps

  1. bowtie1 indexes are automatically created while aligning with qAlign function.

  2. For indexes, we can use the BSgenome package. If we specify the BSgenome, it will automatically be downloaded from Bioconductor[bioconductor.org] website. Or, we could provide the path to fa file for indexes. It can also accept multiple .fa files.

  3. Duplicating bowtie1 options:

    • -p (no. of threads)
    • -v (report end-to-end hits w/ <=v mismatches; ignore qualities)
    • -S (write SAM format)
    • -a (report all alignments per read)
    • --best (hits guaranteed best stratum; ties broken by quality)
    • --strata (hits in sub-optimal strata aren't reported (requires --best))
    • -m (suppress all alignments if > exist (def: no limit))
    • -f (query input files are (multi-)FASTA .fa/.mfa)
    • --un (write unaligned reads/pairs to file(s) )

qAlign: Alignments are generated using the parameters -m maxHits --best --strata. This will align reads with up to “maxHits” best hits in the genome and selects one of them randomly.

  1. star is also available to be used from R
install_github("flow-r/ngsflows")
library(ngsflows)
plger commented 6 years ago

On Thu, Jun 21, 2018 at 3:47 PM, Deepak Tanwar notifications@github.com wrote:

QuasR can perform the first 2 steps

1.

bowtie1 indexes are automatically created while aligning with qAlign function. 2.

For indexes, we can use the BSgenome package. If we specify the BSgenome, it will automatically be downloaded from Bioconductor[ bioconductor.org] website. Or, we could provide the path to fa file for indexes. It can also accept multiple .fa files. 3.

Duplicating bowtie1 options:

  • -p (no. of threads)
  • -v (report end-to-end hits w/ <=v mismatches; ignore qualities)
  • -S (write SAM format)
  • -a (report all alignments per read)
  • --best (hits guaranteed best stratum; ties broken by quality)
  • --strata (hits in sub-optimal strata aren't reported (requires --best))
  • -m (suppress all alignments if > exist (def: no limit))
  • -f (query input files are (multi-)FASTA .fa/.mfa)
  • --un (write unaligned reads/pairs to file(s) )

qAlign: Alignments are generated using the parameters -m maxHits --best --strata. This will align reads with up to “maxHits” best hits in the genome and selects one of them randomly.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mansuylab/shortRNA/issues/4#issuecomment-399109294, or mute the thread https://github.com/notifications/unsubscribe-auth/AJVVScpZLz0jdtleYbqHcgfPctOZrNEQks5t-6PkgaJpZM4Uxzqx .

dktanwar commented 4 years ago

Hi @plger

For alignment, we can use Rsubread with the following options:

https://bioconductor.org/packages/release/bioc/vignettes/Rsubread/inst/doc/SubreadUsersGuide.pdf

  1. -B: 1000?
  2. −−multiMapping

Will this be enough for replacing both bowtie1 and STAR?

dktanwar commented 4 years ago

Other options suggestion:

  1. How many mismatches to be allowed?
dktanwar commented 4 years ago

Hi @plger

Are you aware of ShortStack?

What do you think about the alignment procedure?

dktanwar commented 3 years ago

@plger do we need this issue?

I think the alignment problem is solved with Rsubread.