Open rannick opened 11 months ago
I was wondering about this change. The clear benefit is the short reference building time but are the CTAT genomes up-to-date? I don't see many updates coming from them. Is that a problem?
The ensembl annotation is missing the superlocus IGH IGL so including the following in the reference gtf might improve their detection with starfusion and arriba
chr14 SuperLocus-ext exon 105583731 106875071 . - . gene_id "IGH.g@-ext"; transcript_id "IGH.t@-ext"; gene_name "IGH@-ext";
chr14 SuperLocus-ext exon 105583731 106875071 . + . gene_id "IGH-.g@-ext"; transcript_id "IGH-.t@-ext"; gene_name "IGH-@-ext";
chr22 SuperLocus-ext exon 22030934 22923034 . + . gene_id "IGL.g@-ext"; transcript_id "IGL.t@-ext"; gene_name "IGL@-ext";
chr22 SuperLocus-ext exon 22030934 22923034 . - . gene_id "IGL-.g@-ext"; transcript_id "IGL-.t@-ext"; gene_name "IGL-@-ext";
Description of feature
Use references from CTAT. Allows to shorten reference building time and maybe call a few more fusions with STAR-fusion as it is optimised for this