skingan / HomolContigsByAnnotation

python script to identify genes that are present on multiple contigs
6 stars 5 forks source link

Some advices about HomolContigsByAnnotation. #1

Open a-velt opened 6 years ago

a-velt commented 6 years ago

Dear Sarah,

Thank you for this great tool to try to remove some homolog contigs ! I had tried the Haplomerger2 tool in order to remove the remained haplotigs in my primary contigs, but after haplomerger2, the phase is not conserved because of some merging between contigs and haplotigs and it's not possible for me.

So, I just discovered your tool and it give good results for me. I obtained 198 pairs of contigs sharing one or more genes. So, I launched nucmer on each pair and visualise this alignment with Assemblytics.

I'm writing to you for some advices.

If I well understood, if there is no alignment between two contigs sharing gene(s), this is true duplicated genes. Example : https://www.dropbox.com/s/zi6c50xc114wby7/not_homologs.PNG?dl=0

And if one contig is included in the other, so the smallest contig is an homolog and I can to remove it from my assembly ? Example : https://www.dropbox.com/s/8j49nukjlcr638s/homologs.PNG?dl=0

And if the alignement is not perfect, are there thresholds to say contigs are homologous or not?

I perform this step by hand because I think there is no other solution than viewing each alignment one by one manually ?

Have a nice day, Amandine

Timothy-WANG commented 6 years ago

Dear Amandine,

Your suggestion really make sense. In your case, you visualise the 198 pair one by to check, I just want to ask how you could filter the homolog contigs once the assembled have hundreds contigs? Thank for your attention!

Timothy