Closed zhangrengang closed 4 years ago
Hello @zhangrengang,
I think this is a very good point and I agree that the classification of copia and gypsy in LTR_retriever is not the best scheme. I have been using the copia and gypsy specific hmms in rice to assign new LTR elements into these superfamilies. A better way would be to use the GyDB to assign superfamilies as you suggested. Another way I have been thinking of, but not yet get the time to implement, is to use the order of these conserved domains to classify, which is the fundamental difference between gypsy and copia.
If you can implement a better scheme, welcome to contribute! For benchmarking of accuracy, I use the rice curated TE library.
Best, Shujun
Hello Dr. Ou, here is a simple implement. You may test it and/or intergrate it.
Hello @zhangrengang ,
Thank you so much for developing these code in such a short time. I will test it soon and let you know.
Best, Shujun
Thousands of LTR in a plant genome are clasified as unkown by LTR_retriever. However, most of them are clasified as Copia on the basis of GyDB as belows:
I think there is an issue in
annotate_TE.pl
:Copia has the same wieght (0.3) as Gypsy but Copia only has 8 PFAMs, ~1/3 of 28 PFAMs of Gypsy.