tangerzhang / ALLHiC

ALLHiC: phasing and scaffolding polyploid genomes based on Hi-C data
174 stars 39 forks source link

How should I set cds names when I generate Allele.ctg.table? #19

Closed LIZW2019 closed 5 years ago

LIZW2019 commented 5 years ago

Hello,

Thank you so much to design such an excellent software to do with HiC assembly! I am a green hand in genome assembly. Now I am going to generate the Allele.ctg.table, but I am confused of the guidance saying "Please modify cds name before running BLAST. The cds name should be same with gene name present in GFF3". Do you mean the ">" line in cds.fasta before blast should be exactly the same as the original gff3 gene name? Is gene ID OK? Can you give an example for me? Thanks a lot!

tangerzhang commented 5 years ago

Hi @LIZW2019 , If you are working on a diploid genome, you can ignore Allele.ctg.table and use the suggested pipeline:
https://github.com/tangerzhang/ALLHiC/wiki/ALLHiC:-scaffolding-of-a-simple-diploid-genome If you indeed need Allele.ctg.table for polyploidy assembly and not familiar with coding, you can use a gmap-based method to identify allelic contigs, which is attached here: https://github.com/tangerzhang/ALLHiC/issues/16 This method is simple and does not need annotation of your target genome.

LIZW2019 commented 5 years ago

Thank you for your reply, it really solved my problem!