mskilab-org / gTrack

R package for visualizing complex and multi-track 1D and 2D genomic data in GenomicRanges framework
16 stars 12 forks source link

added optional "label" as a special metadata column for track.gencode #18

Closed ShaiberAlon closed 2 years ago

ShaiberAlon commented 2 years ago

Added the parameter gencode.cols to track.gencode This is character vector specifying additional columns to include in the gencode data used to generate the gTrack.

So for example if we run:

gngt = track.gencode(gencode, gencode.cols = c('score', 'type', 'label'))

Then as long as 'score', 'type', and 'label' are actual column names in gencode then we expect these to be available in the GrangesList associated with gngt and accessible in the following way:

gngt@data[[1]][[i]][, c('score', 'type', 'label')]

Where i is an integer corresponding to the i'th gene (notice that gngt@data[[1]] is a GRangesList where each item GRanges corresponds to a gene).

This means we could now use these fields for example in the following way:

gngt$gr.labelfield = 'score'

The special field: "label"

We treat "label" as a special case, where if gencode includes a "label" field and "label" %in% gencode.cols then the "label" column is used to annotate not just each GRanges, but also the metadata of the GRangesList. The value for each gene is assumes to be unique (since each row in the GRangeList metadata corresponds to a gene and hence only a single value could be assigned to each gene).

This allows for the following use usage:

gngt$grl.labelfield = 'label'

By default gngt$grl.labelfield is set to id which is hardcoded to inherit the values of gene_name in the gencode input. label allows us to provide an alternative. So for example, if we want to have all genes included in gngt, but we only want some specific genes of interest to have labels then we can do something like this:

gencode = rtracklayer::inport(gencode.fn)
gene_labels = gencode$gene_name
gene_labels[which(!(gene_labels %in% genes_of_interest))] = NA
gencode$label = gene_labels
gngt = track.gencode(gencode)
gngt$grl.labelfield = 'label'

Where gencode.fn is the path to a gencode gff file and genes_of_interest is a character vector with gene names. Notice that gencode.cols is by default set to label and hence we don't need to specify it in the code above.