thackl / gggenomes

A grammar of graphics for comparative genomics
https://thackl.github.io/gggenomes/
Other
606 stars 65 forks source link

Calculate coordinates from gff file with genes in different directions #186

Open cabraham03 opened 6 months ago

cabraham03 commented 6 months ago

Hi, I’m using the read_feats function to import gff files generated with prokka, and then use the coordinates from some genes to generate a plot, nevertheless some of those genes are in different direction. It is possible to calculate the position of the obtained coordinates for multiples genes using the data.frame generated with read_feats function ??

Example of the 2 genes and 2 genomes. I just import the gff files, all in GFFdf (data.frame) and filter the genes that I want to plot

df_genes <- filter(GFFdf, file_id %in% c("GENOME300", "GENOME320") & gene %in% c("geneA", "geneC") 
df_genes <- df_genes[, c("file_id", "seq_id", "start", "end", "strand", "type", "feat_id", "locus_tag", "gene")]
df_genes <- df_genes |> mutate( orientation = paste0(strand, 1) |> as.integer())
df_genes
    file_id      seq_id  start    end strand type        feat_id      locus_tag  gene orientation
1 GENOME300 GENOME300_1 206268 206663      +  CDS IMEHDJCA_00193 IMEHDJCA_00193 geneC           1
2 GENOME300 GENOME300_1 206686 222306      +  CDS IMEHDJCA_00194 IMEHDJCA_00194 geneA           1
3 GENOME320 GENOME320_8 123699 139310      -  CDS ILCJGNBA_01570 ILCJGNBA_01570 geneA          -1
4 GENOME320 GENOME320_8 139333 139728      -  CDS ILCJGNBA_01571 ILCJGNBA_01571 geneC          -1

the plot:

genex

I just want to calculate the start and end position for the geneA and geneC in GENOME320 in the same direction of the GENOME300 !!!

any suggestion ??? Thanks

thackl commented 6 months ago

I'm not sure I fully understand. Do you want to plot them in the same orientation? Or why do you want to calculate their start and end?

For plotting you can use https://thackl.github.io/gggenomes/reference/flip.html to change the orientations of the contigs (and genes with it).

Something like the following should work

gggenomes(df_genes) |>
  flip(GENOME320) +
  geom_gene()

Is that what you meant?