Open ohdongha opened 5 days ago
Dear Dong-Ha,
Try -Q7 (or -Q4) option insted of -Q4:
$ spaln -Q7 -d gnm -A0 -O10 -t8 -o gnm_aa_1.sam aa_1.fa
Osamu,
Try -Q7 (or -Q4) option insted of -Q4:
Thanks a lot, @ogotoh! I missed the detailed instructions about -Q.
However, the SAM output still does not seem to work. When I tried again with -Q7
, the SAM header was printed to stdout
while "No sam records!" was repeatedly printed to stderr
. In the end, the SAM output file (gnm_aa_1.sam
below) was empty.
Below, I tried to capture stdout
and stderr
separately:
$ spaln -Q7 -d gnm -A0 -O10 -t8 -o gnm_aa_1.sam aa_1.fa 1> gnm_aa_1.stdout 2> gnm_aa_1.stderr
$ head gnm_aa_1.stderr
XP_058799706.1 < 0 333 NC_087241.1 10651 10651 14.02 0.00 137 210 6 5 12 10
No sam records !
XP_039314136.1 < 0 112 NC_087229.1 3657 3657 14.10 0.00 32 50 2 2 4 4
No sam records !
XP_034943909.1 > 0 254 NC_087230.1 4714 4714 0.00 13.30 103 140 5 5 10 10
NP_001177534.1 > 0 370 NC_087235.1 7662 7699 7.90 0.00 182 149 9 8 18 16
No sam records !
XP_014235745.1 < 0 683 NC_087229.1 3641 3641 0.00 14.07 316 412 13 12 26 24
No sam records !
No sam records !
$ (head -n5; echo "..."; tail -n5) < gnm_aa_1.stdout | column -t -s$'\t'
@HD VN:1.3 SO:unsorted
@SQ SN:NC_087225.1 LN:15008981
@SQ SN:NC_087226.1 LN:13116906
@SQ SN:NC_087227.1 LN:12333390
@SQ SN:NC_087228.1 LN:11548418
...
@SQ SN:NW_026974116.1 LN:7856
@SQ SN:NW_026974117.1 LN:7739
@SQ SN:NW_026974118.1 LN:7733
@SQ SN:NW_026974119.1 LN:7397
@SQ SN:NW_026974120.1 LN:6325
When I tried other output formats, it worked. For example, the command below printed two GFF and one BED format output files. The BED output (gnm_aa_1.O3
) had 16,685 lines with one alignment per line:
spaln -Q7 -d gnm -A0 -O0,2,3 -t8 -pq -o gnm_aa_1 aa_1.fa
It will be great if the output can be in SAM (or, more preferably, in PAF) due to compatibility with the downstream pipelines.
Thanks again for your help, and please have a look. Cheers, Dong-Ha
Dear Dong-Ha,
Sorry, I have forgotten that Spaln does not support SAM output for protein queries, because SAM format is suited for short read mapping, in my opinion. I also do not recommend BED format, because BED format does not discriminate an intron and an ordinary deletion in the query. I guess most people choose GFF3 output.
Osamu,
Dear Author, Thank you for developing and updating an aligner capable of aligning proteins to a genome on a large scale.
I want to align protein sequences collected from species in the same order (file name
aa_1.fa
with 25,000 protein sequences) to a genome sequence (file namegnm.gf
) so that the spliced alignments can guide the genome annotation process.I installed the version
3.0.6a <240916>
and below is what I tried and the messages printed to the screen:The run finished within a few minutes, and the output was much smaller than I expected.
I have these questions:
gnm.bkp
instead of the chromosome/scaffold names. The mapping position in the 4th column does not seem to be the genome coordinate either. Is there an additional step needed to interpret the SAM output above?protein.fa
to the genome.-d gnm
,-d gnm.gf
,-d gnm.bkp
, etc., but they all gave "Can't open query !" What is the correct way to map & align in this case?Usage: spaln -W[Genome.bkn] -KD [W_Options] Genome.mfa (to write block inf.) spaln -W[Genome.bkp] -KP [W_Options] Genome.mfa (to write block inf.) spaln -W[AAdb.bka] -KA [W_Options] AAdb.faa (to write aa db inf.) spaln -W [Genome.mfa|AAdb.faa] (alternative to makdbs.) spaln [R_options] genomic_segment cDNA.fa (to align) spaln [R_options] genomic_segment protein.fa (to align) spaln [R_options] -dGenome cDNA.fa (to map & align) spaln [R_options] -dGenome protein.fa (to map & align) spaln [R_options] -aAAdb genomic_segment.fa (to search aa database & align) spaln [R_options] -aAAdb protein.fa (to search aa database) (... shortened ...)