Closed ahdee closed 3 years ago
It contains Pfam protein domains.
@suhrig thanks but do you also know which site it was download at? I went to the pfam site but it looks there are no gff3 files? reason why I ask is because I get ask a lot where the domain annotations are comming from and it would be nice to have more details.
The GFF3 file is not available for download anywhere other than in the Arriba release package. On the Pfam site you will find this file. To get the GFF3 file, you need to map the protein coordinates to genomic coordinates. I used the Bioconductor package ensembldb
for this purpose. This is how the file was generated.
@suhrig great thank you, this is very helpful.
Thanks, awesome fusion caller! I would like to know what the source is for the database for the protein_domains_hg38_GRCh38_v2.1.0.gff3? Thanks!