Open friedue opened 3 years ago
Following up on this, I've created, what I think, are BED files matching the description in your README file here (with the small caveat that your README file does not mention the "strand" field, which is part of the output by SmartMapPrep.
Just for documentation purposes, here's what I'm currently running:
$ SmartMap -m 100 -g chrLengths.txt -o myTest splits/
## here's what the one of the BED files looks like
$ cat splits/test_map-10.bed
chr1 48498420 48498419 SRR10195006.189 + AS:i:73 YS:i:73
chr10 3625016 3625015 SRR10195006.189 + AS:i:73 YS:i:73
chr10 106316624 106316623 SRR10195006.189 + AS:i:73 YS:i:73
chr14 92387163 92387162 SRR10195006.189 + AS:i:73 YS:i:73
chr18 90277345 90277344 SRR10195006.189 + AS:i:73 YS:i:73
chr6 109070828 109070827 SRR10195006.189 + AS:i:73 YS:i:73
chr17 54912751 54912750 SRR10195006.189 + AS:i:73 YS:i:73
chr7 59065216 59065215 SRR10195006.189 + AS:i:73 YS:i:73
chr5 63603854 63603853 SRR10195006.189 + AS:i:73 YS:i:73
chr6 56508593 56508592 SRR10195006.189 + AS:i:73 YS:i:73
chr16 6342380 6342379 SRR10195006.626 + AS:i:73 YS:i:73
chr4 55418228 55418227 SRR10195006.626 + AS:i:73 YS:i:73
chrY 50040361 50040360 SRR10195006.626 + AS:i:73 YS:i:73
chr4 55422936 55422935 SRR10195006.626 + AS:i:73 YS:i:73
chrY 77490698 77490697 SRR10195006.626 + AS:i:73 YS:i:73
chrY 70902504 70902503 SRR10195006.626 + AS:i:73 YS:i:73
chrY 62660096 62660095 SRR10195006.626 + AS:i:73 YS:i:73
chrY 70897877 70897876 SRR10195006.626 + AS:i:73 YS:i:73
chr14 91647339 91647338 SRR10195006.626 + AS:i:73 YS:i:73
chrY 77486070 77486069 SRR10195006.626 + AS:i:73 YS:i:73
chr1 102547294 102547293 SRR10195006.745 + AS:i:74 YS:i:74
chr4 26461528 26461527 SRR10195006.745 + AS:i:74 YS:i:74
chr4 59966076 59966075 SRR10195006.745 + AS:i:74 YS:i:74
chrX 46991250 46991249 SRR10195006.745 + AS:i:74 YS:i:74
chr10 6550281 6550280 SRR10195006.745 + AS:i:74 YS:i:74
chr3 135195591 135195590 SRR10195006.745 + AS:i:74 YS:i:74
chr13 106225578 106225577 SRR10195006.745 + AS:i:74 YS:i:74
chr14 38036760 38036759 SRR10195006.745 + AS:i:74 YS:i:74
chr1 45106498 45106497 SRR10195006.745 + AS:i:74 YS:i:74
chr4 70585296 70585295 SRR10195006.745 + AS:i:74 YS:i:74
chr4 13055720 13055719 SRR10195006.562 - AS:i:73 YS:i:73
chr5 60773017 60773016 SRR10195006.562 - AS:i:73 YS:i:73
chrX 114697080 114697079 SRR10195006.562 - AS:i:73 YS:i:73
chr9 81464996 81464995 SRR10195006.562 - AS:i:73 YS:i:73
chr2 16961773 16961772 SRR10195006.562 - AS:i:73 YS:i:73
chr17 16023600 16023599 SRR10195006.562 - AS:i:73 YS:i:73
chr14 60338890 60338889 SRR10195006.562 - AS:i:73 YS:i:73
chrX 143157135 143157134 SRR10195006.562 - AS:i:73 YS:i:73
chr14 36214192 36214191 SRR10195006.562 - AS:i:73 YS:i:73
chr1 80085965 80085964 SRR10195006.562 - AS:i:73 YS:i:73
Hi,
The scores here are the internal Bowtie2 scores used to measure the quality of an alignment; a higher (less negative) score is indicative of a better alignment. The AS:i:
score measures the quality of the alignment of the read and the YS:i:
score measures the quality of the mate.
There are not separate directories for each number of alignments per read -- just one output file!
In your file here, you have the strand in the fifth column. If you want to do a strand-specific analysis (like RNA-seq) where you analyze both strands separately and independently, you need to include the -S
flag in your SmartMap command, something like: SmartMap -S -m 100 -g chrLengths.txt -o myTest splits/*.bed
If, on the other hand, you don't want a strand-specific analysis (e.g. ChIP-seq), get rid of the fifth field from the BED files and use the command SmartMap -m 100 -g chrLengths.txt -o myTest [whatever your bed directory is]/*.bed
Hope that helps!
Ah, right, I was looking at the SmartMapRNAPrep script, which is where the strand was coming from.
Is the alignment score taken into account for the weighting down the road?
If you are using scored mode (where you set the minimum score with the -s
flag (recommended at -0.6*(total read length + 1)
), then this alignment score is taken into account for weighting. If you do not enable the -s
option, then it is not taken into account. We recommend using scored mode -- we find that it has lower error than does unscored mode.
and how would the score mode be affected by the absence of mates for single-end data?
I haven’t tested it with single-end data so I don’t know how it will influence the outcome – a priori, I can’t imagine it would be a positive impact relative to paired-end data, but I can’t say for certain.
In the absence of a mate, there won't be a YS column provided by the aligner. In this case, the YS column is required by SmartMap insofar as it has to exist, but if you just append a field YS:i:0 to each line of the BED file then it should be alright. The minimum score parameter (flag -s) will then be -0.6*(read length + 1)
where read length is the SE read length.
Let me rephrase: why is the YS:i field needed? How is it taken account for the scoring (or any other part of the algorithm)?
Hi,
I'm considering to use SmartMap, but would prefer to use STAR instead of bowtie for the alignment step. Can you delineate which types of scores you expect and (how you interpret them) for the
AS:i
tag? And do I understand correctly that you generate separate folders for reads that align to 2,3,4 and more regions, i.e. if we have reads with 5000 hits, that would mean 5000 directories?EDIT: Basically, I just want to get a better sense of how to re-create a BED file of alignments that will work with SmartMap.
Thank you!
Friederike