Closed sammyjava closed 1 year ago
I suspect the issue with Tifrunner.gnm2.ann1.4K0L has to do with the way this was constructed by liftover of genes from the older assembly to the newer. Probably some did not successfully lift and the other files were just given the old s/gnm1/gnm2/ treatment, as not having coordinate info, and were hence retained. I will not be sorry when the new gnm2.ann2 annotations are ready for the datastore, but I suppose the thing to do for these would be regenerated the fastas using gffread and re-rerun gfa.
Will have to look more closely at W05 as I don't know why that would have issues.
OK, so the reason these files have gfa entries referring to genes not in the gff is that there are proteins (and corresponding transcripts and CDS) that appear in the fasta files but which correspond to nothing in the gff. Will it suffice to remove the gfa entries with the gene whose existence was inferred erroneously or do we also need to clean up the fasta? I have some hesitation in doing so which isn't entirely due to being lazy; but if having extra entries in the fastas will cause errors in loading I can go ahead with the further clean-up. It's really only one "gene" in the case of W05 but impacts 133 in Tifrunner.gnm2.ann1 (which again, is a consequence of the way these genes were lifted-over from the older to newer assembly)
I think the simple and correct thing is to just remove those genes from the GFA. Those are not genes for which we can assign pathways, because they don't exist in the annotation. Non gene-associated proteins existing in the annotation is their own business.
Which would you consider more correct, to suppress the GFA record altogether or to leave the gene id empty but indicate the assignment of the protein to the family?
Remove the entire record.
Ok should be good to go. W05 is a classic datastory, with one gene having been moved into its own "bad_gene" gff file for reasons I'm not clear on. But I moved the gfa record into a parallel holding pen. Having a "bad_gene" file seems vaguely like eugenics...