Open nlfortier opened 2 weeks ago
Thanks for the report! On first glance, it appears to me that this is related to the rollout of our new Fusion data model. Spot checking several of the Genes you listed, I see that they no longer have any variants associated with them.
For instance the Gene RASGRF1 has no direct variants under it any more, but there are new Fusion Features (OCLN::RASGRF1, IQGAP1::RASGRF1, SLC4A4::RASGRF1) which have associated variants and evidence items.
The default behavior of the TSV exports is to only export Genes that have at least one variant, molecular profile, and evidence item associated with it, but in these cases it appears all of the associated variants were in fact fusion variants that have since been moved.
We probably need to introduce a new FeatureSummaries.tsv file that includes all feature types (Genes, Fusions, and Factors) so that it can be comprehensive. We may also be able to introduce some heuristic to include Genes in the GeneSummaries.tsv that have curated summaries, sources, etc, or that are included in Fusion Features. We will get a fix out for this in the next release and I'll follow up here!
Over the past few months many genes have been dropped from the GeneSummaries.tsv file on the CIViC Data Releases page.
The following genes were in the September release, but were missing from the October release: BEND2, CBFA2T3, CBFB, CREB3L1, CREB3L2, DDIT3, DEK, DGKH, DUX4, FLI1, FUS, GLI1, HMGA2, IL2RB, MAML2, MAP3K8, MNX1, NCOA2, NFATC2, NUP214, NUP98, NUTM1, PDGFD, PRKACA, PTK2B, SH3PXD2A, SSX1, SSX2, SSX4, TLX3, WWTR1, ZFTA, ZNF384
Five additional genes were subsequently dropped in the November release: KAT6A, RASGRF1, RBM15, VGLL2, YWHAE
These genes can still be looked up using the website which leads me to suspect that these genes were removed erroneously.