oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

The inconsistency between the classification labels of the EDTA library and RepeatMasker #442

Closed CSU-KangHu closed 3 months ago

CSU-KangHu commented 3 months ago

Hi @oushujun, Thank you for developing such a useful tool like EDTA.

While running EDTA, I encountered an issue that I had previously overlooked. The classification labels of the TE library outputted by EDTA do not match those of RepeatMasker. This discrepancy results in certain types of TEs being ignored when the TE library generated by EDTA is used as the input parameter (-lib) in RepeatMasker to generate the .tbl file.

For instance, EDTA's Helitron label is classified as DNA/Helitron, whereas RepeatMasker classifies it as RC/Helitron. As a result, the Rolling-circles line in the .tbl file shows 0 bp, and the proportion of DNA transposons is overestimated. Similarly, there are issues with other labels such as MITE, which seem to be classified as DNA labels for RepeatMasker to properly categorize them as DNA transposons. Could you please let me know if EDTA provides a script to convert the classification labels of the TE library to match those of RepeatMasker?

oushujun commented 3 months ago

I haven’t figured out what names RepeatMasker takes or not takes. You should use the less picky buildSummary.pl script to generate the rm out summary.

Shujun

On Tue, Mar 5, 2024 at 9:07 PM Kang Hu @.***> wrote:

Hi @oushujun https://github.com/oushujun, Thank you for developing such a useful tool like EDTA.

While running EDTA, I encountered an issue that I had previously overlooked. The classification labels of the TE library outputted by EDTA do not match those of RepeatMasker. This discrepancy results in certain types of TEs being ignored when the TE library generated by EDTA is used as the input parameter (-lib) in RepeatMasker to generate the .tbl file.

For instance, EDTA's Helitron label is classified as DNA/Helitron, whereas RepeatMasker classifies it as RC/Helitron. As a result, the Rolling-circles line in the .tbl file shows 0 bp, and the proportion of DNA transposons is overestimated. Similarly, there are issues with other labels such as MITE, which seem to be classified as DNA labels for RepeatMasker to properly categorize them as DNA transposons. Could you please let me know if EDTA provides a script to convert the classification labels of the TE library to match those of RepeatMasker?

— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/442, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NE5DWNH6ECZGHOOFSTYWZ273AVCNFSM6AAAAABEIGMQCGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE3TANBYGM4DOOI . You are receiving this because you were mentioned.Message ID: @.***>