thackl / gggenomes

A grammar of graphics for comparative genomics
https://thackl.github.io/gggenomes/
Other
581 stars 64 forks source link

generating a file like emale_prot_ava #63

Closed aleuUH closed 3 years ago

aleuUH commented 3 years ago

Hello again @thackl

I was wondering how the emale_prot_ava file was generated. I created a blastp file with outfmt 6 and read that in using read_blast() as so:

FR_prot_ava <-read_blast("genes.blastp.tsv")

which looks like this: FR_prot_ava

A tibble: 671 x 12

qaccver saccver pident length mismatch gapopen qstart qend sstart send evalue bitscore

1 contig_26_pilon_925301_962441_1 contig_1560_pilon_1 100 147 0 0 1 147 5 151 9.58e-112 306 2 contig_26_pilon_925301_962441_1 contig_26_pilon_925301_962441_1 100 147 0 0 1 147 1 147 1.21e-111 305 3 contig_26_pilon_925301_962441_1 contig_1560_pilon_26 26.2 80 45 3 34 100 141 219 3.8 e- 1 21.9 I seem to missing the file_id feat_id feat_id2 columns. Any tips would be great! Thanks, Andy
thackl commented 3 years ago

Ah, good question (and not well documented, I have to admit). read_blast() reads blast results just in a generic way (as is). To add the missing columns import your blast results with read_sublinks() - this tells gggenomes to expect links between genomes, and the 'sub' tells it that ids and coordinates are relative to features (proteins), not sequences (contigs).

This is how I generated emale_prot_ava: emale_prot_ava <- read_sublinks(ex("emales/emales-prot-ava.o6"))

aleuUH commented 3 years ago

Working now.

Thanks again!