oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

Inflated TE counts and masked bp in EDTA annotation after removal of part of the genome #434

Open Nyasita opened 4 months ago

Nyasita commented 4 months ago

Hi, I have used EDTA to annotate TEs for my plant genome. Initially, I had 12 chromosomes and I ran EDTA but then I got rid of one chromosome that we suspect to be something else and re-ran EDTA. Now my point of contention is I'm getting rather inflated counts and number of bp masked for the 'smaller' genome. Basically, I would expect with 11 chromosomes in my genome, the numbers of particular TEs and the bp masked would be less than what was observed for 12 chromosomes.

I ran the code with --sensitive 1 --anno 1 --evaluate 1

Below is a table showing the counts. The counts in brackets are from the run with 12chromosomes and the highlighted values are where i observed inflated values in the smaller genome,

image

I also removed the 'suspect' chromosome from the gff file i had obtained from the first run with 12 chromosomes genome and computed the results using the protocol here, and got very different results from the actual run (results shown in the table below; again the values in brackets are from the run with 12 chromosomes and the difference is between the unbracketed values in the table below and the table above) . What would be the explanation to this disparity?

image

oushujun commented 3 months ago

Hello,

Sorry for the delay. You may want to remove the extra chromosome in the fasta file, the gff, and the stat file to do the manual computation. Direct rerun EDTA may directly give you peace of mind. Please let me know if you have other thoughts.

Thanks, Shujun

Nyasita commented 3 months ago

I did remove the extra chromosome in the fasta, gff and stat file and the results are as i shared above. I then reran EDTA directly and the results are also shown in my original issue. My worry is just why the results are different.

oushujun commented 3 months ago

I have no idea. Can you create a reproducible example for me to check on my end?