sjteresi / TE_Density

Python script calculating transposable element density for all genes in a genome. Publication: https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-022-00264-4
GNU General Public License v3.0
28 stars 4 forks source link

Error when no. of TE annotation chromosomes != no. gene annotation chromosomes #108

Closed sjteresi closed 1 year ago

sjteresi commented 1 year ago

An error message already exists for the situation where the number of unique chromosomes is not equal between the annotations. However I should probably make the value error message that is raised a little more verbose because this will likely be a common error. This check occurs in the initial phase of preprocessing.

davidaray commented 1 year ago

I got this error 10 minutes ago and did find that there are several smaller scaffolds with no gene annotations. Would you recommend removing those scaffolds from both the TE annotations file and the gene annotations file?

sjteresi commented 1 year ago

Yes. I would remove both scaffolds/pseudomolecules. TE Density is only computed between pseudomolecules/scaffolds of the same identity. I will fast-track this documentation/check later this week. I'll add a better warning and some recommendations to the documenation.

To summarize for others, the program wants one pseudomolecule of TE information for each pseudomolecule of gene information. A situation can arise where you may not have gene annotations on a small scaffold, but you do have TE annotations there (or vice versa). The program raises an error because it is missing half of the data to do a computation for that pseudomolecule.