Closed yannickwurm closed 10 months ago
Hi @tadast - a bit more info here. Apologies for the delay
No need to install anything special. Here I just downloaded Cdd.tar.gz and decompressed it. Doesn't work if path has weird chars in it.
Example:
cat ~/.sequenceserver/minidb/SI_putativeTranscripts.fasta | seqtk seq -a | head -n 30 > test.fasta
rpstblastn -query test.fasta -db Cdd -outfmt 7 -num_threads 8 -evalue 1.0e-5 -max_target_seqs 10 > test.rpstblastn.cdd.tab
cat test.rpstblastn.cdd.tab
Output:
# RPSTBLASTN 2.14.0+
# Query: SiJWA01AAW.scf
# Database: Cdd
# Fields: query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 1 hits found
SiJWA01AAW.scf CDD:436463 22.785 158 116 2 14 478 100 254 4.63e-24 95.2
# RPSTBLASTN 2.14.0+
# Query: SiJWA01AAX.scf
# Database: Cdd
# 0 hits found
# RPSTBLASTN 2.14.0+
# Query: SiJWA01ACE.scf
# Database: Cdd
# Fields: query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 10 hits found
SiJWA01ACE.scf CDD:395170 53.968 63 29 0 243 431 1 63 1.25e-24 91.0
SiJWA01ACE.scf CDD:237664 43.243 74 42 0 231 452 1 74 1.31e-21 90.2
SiJWA01ACE.scf CDD:237660 48.649 74 38 0 231 452 2 75 1.50e-21 90.3
SiJWA01ACE.scf CDD:236757 50.794 63 31 0 243 431 5 67 5.32e-21 88.7
SiJWA01ACE.scf CDD:223560 50.769 65 32 0 237 431 3 67 1.95e-20 87.3
SiJWA01ACE.scf CDD:184599 47.826 69 36 0 243 449 6 74 2.73e-20 86.8
# and so on.
Just like with normal blast, we have different -outfmt options including json.
the Query-start and query_end coordinates are the regions of the query sequence we want to highlight. (e.g. on the first image above, those would be ~250 to 600).
The human-friendly description of the CDD domain is likely visible int he long table output... or in the JSON/XML outputs...
Cloud users now have this. 🙌
Example:
And in this BLAST output:
NCBI Blast shows CDD hit domain analysis on protein queries. This is super useful and also biologically informative (e..g, 'which functional part of my gene is conserved")?
Those pictures come from "rpsblast" alignment of precomputed protein domain matrixes (README): The relevant output can be obtained using:
We could:
have the visualisation unfold above our current overview of where this align to the query.
(on NCBI, if you click on it, you get extra info, including a dedicated Evalue. I intuit that those extra details are less important)