stianlagstad / chimeraviz

chimeraviz is an R package that automates the creation of chimeric RNA visualizations.
37 stars 14 forks source link

Better document how to create a file with protein domain coordinates #32

Open stianlagstad opened 6 years ago

stianlagstad commented 6 years ago

Although documented here, it should be easier to create the data needed for the protein domain plot. This issue will track my progress on this.

Initial ideas:

ahdee commented 4 years ago

Hi I'm still a bit confuse about this. So for example I have a bed file that I got from ucsc as such,

chr11 | 114242180 | 114242249 | zinc finger | 1000 | + | 114242180 | 114242249 | 100,100,0 | 1 | 69 | 0 | Manually reviewed (Swiss-Prot) | zinc finger region | amino acids 490-512 on protein Q05516 | C2H2-type 4 | Q05516 so this is a region that is a known zinc finger, however how will I then convert this so that I cause use it with ?

stianlagstad commented 1 year ago

Since I received another request by email to better document this process, I'll just note here that I have not worked on this issue since it was created on on Mar 6 2018, and I likely will not in the coming months. All the details I have at the moment are here: https://github.com/stianlagstad/chimeraviz/blob/80466b8fa6c7eb8fa21dc64f90a7a96d01f94e81/R/extdata.R#L192. That is:

#' protein_domains_5267 bed file
#'
#' Documentation for the protein_domains_5267.bed file containing protein
#' domains for the genes in the fusion with cluster_id=5267.
#'
#' @name raw_fusion5267proteindomains
#'
#' @section protein_domains_5267.bed:
#'
#' This file is an excerpt from a larger file that we created by:
#' - downloading domain name annotation from Pfam database (PfamA version 31)
#'   and domain region annotation from Ensembl database through BioMart API
#' - switching the domain coordinates in the protein level to these in
#'   transcript level.
NULL

Some things can also be learned by looking at the code behind the protein domain plot: https://github.com/stianlagstad/chimeraviz/blob/80466b8fa6c7eb8fa21dc64f90a7a96d01f94e81/R/plot_fusion_transcript_with_protein_domain.R.

If anyone does look into this then I would be very happy to approve a PR which adds further documentation of how to do this.