knights-lab / BURST

An ultrafast optimal aligner for mapping large NGS data to large genome databases.
GNU Affero General Public License v3.0
57 stars 8 forks source link

What is the -s option? #28

Open ZeweiSong opened 4 years ago

ZeweiSong commented 4 years ago

I was wondering what did the -s do to the sequence? Does it shear the input into specified length? Then how should burst deal with the sheared gap?

There is in the example -s is used alone, or as -s 1. It is a bit confusing to me.

Zewei

GabeAl commented 4 years ago

Hi Zewei,

It's great to hear from you! This concerns shearing the reference sequences in database generation. It is an option to control the internal database chunk size. It should have no effect on the alignments to that database, but it may affect the database's size as well as alignment speed. Smaller shears result in better de-duplication, but in absence of known small duplicated regions in the input sequences, it may be better to set the shear higher (e.g. 1000-4000).

Cheerio, Gabe

On Tue, Aug 11, 2020 at 9:11 AM Zewei Song notifications@github.com wrote:

I was wondering what did the -s do to the sequence? Does it shear the input into specified length? Then how should burst deal with the sheared gap?

There is in the example -s is used alone, or as -s 1. It is a bit confusing to me.

Zewei

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/knights-lab/BURST/issues/28, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5NOBX5LARFHBLSHJFPTCDSAE7ORANCNFSM4P3AV7ZA .

mikemc commented 4 years ago

@GabeAl To follow up on @ZeweiSong 's question, there is currently a line in the Readme,

  1. Run burst -r MyDB.fasta -d DNA 320 -o MyDB.edx -a MyDB.acx -s 1 -i 0.97 to generate a database and accelerator.

where the option -s 1 is used. Is this a typo? Or is using -s 1 in fact recommended in some situations?