qiyunlab / HGTector

HGTector2: Genome-wide prediction of horizontal gene transfer based on distribution of sequence homology patterns.
BSD 3-Clause "New" or "Revised" License
131 stars 35 forks source link

The meaning and impact of genes with 'TaxID' of 0 in the HGTector2 prediction result #131

Open chenhuag opened 1 year ago

chenhuag commented 1 year ago

I used HGTector2 to predict HGTs in my genome, and it classified my genome as "phylum Proteobacteria". The prediction results showed that 318 genes were identified as HGTs, but only 40 of them had a 'TaxID' and were eukaryotes and archaea. The 'TaxID' for 278 genes was 0. Can you help me understand the meaning of this result and the impact on the analysis?

qiyunzhu commented 1 year ago

Hello @chenhuag Thanks for your interest in this program! 318 genes being identified as HGT-derived means that they have an atypical homology search pattern which is likely attributed to HGT. 278 genes having the potential donor identified as TaxID 0 meaning that the potential donor cannot be identified. Since you were examining a very deep taxonomic level (phylum), it could be that the donor information was already lost or attenuated throughout the history of evolution. Hope it helps!

chenhuag commented 1 year ago

Thank you for your answer. I check the log file and I get a warning message "WARNING: Cannot cluster distal group using KDE. Use fixed threshold 25 instead". Is this related to this result?

qiyunzhu commented 1 year ago

Your understanding is correct.

hhj00123 commented 3 months ago
7e9ff44c20912403657cceda8a705c1

Hi,I have a question:when I set up the self- and close- taxa, I found a large number of HGT events were annotated with the source as N/A. Can I assume that a significant portion of these HGT events are very similar to the close- or self- groups we defined, making it difficult to determine their origin?