calculate percent identity (%id)

hillerlab / ForwardGenomics

Methods for finding associations between phenotypic and genomic differences between species using the Forward Genomics framework

MIT License

21 stars 8 forks source link

calculate percent identity (%id) #19

Open zuodabin opened 1 year ago

zuodabin commented 1 year ago

Dear Professor, Now I have CNE file in fasta format, and I can reconstruct the ancestor sequence by PRANK align (By the way, the script I used for the reconstructed ancestor sequence was not Maf2SpanningSeq_PRANK.perl, but PRANK software: prank -d=roast.bed_feature-1.fa -keep -showtree -showanc -prunetree -seed=10 -o=conserved_regions_ancestral.fa). What I want to ask is which script can I use if I want to calculate percent identity (%id) after the reconstruction of the ancestor sequence? Whether can be used directly GetGlobalAndLocalPercentID. Perl?

I wish all the best!

MichaelHiller commented 1 year ago

Yes, GetGlobalAndLocalPercentID.perl is the downstream script to compute the %id values.

zuodabin commented 1 year ago

Thank you very much, Professor. Have a nice day

------------------ 原始邮件 ------------------ 发件人: "hillerlab/ForwardGenomics" @.>; 发送时间: 2023年3月17日(星期五) 下午2:35 @.>; @.**@.>; 主题: Re: [hillerlab/ForwardGenomics] calculate percent identity (%id) (Issue #19)

Yes, GetGlobalAndLocalPercentID.perl is the downstream script to compute the %id values.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

xinghua1001 commented 1 year ago

Hi @MichaelHiller and @zuodabin, I also have some CDS fasta file, which can not be fed in Maf2SpanningSeq_PRANK.perl. I use the PRANK command: prank -d=cds.fa -keep -showtree -showanc -prunetree -seed=10 -o=cds.fa it output cds.fa.anc.fas and cds.fa.anc.dnd. Then I run the GetGlobalAndLocalPercentID.perl as follows: ./GetGlobalAndLocalPercentID.perl cds.fa.anc.fa test -treeFile ../cds.fa.anc.dnd -allowedAncestralNodes 1 -global it turns out: ERROR: do not find human-mouse # in the alignment.

I guess the input alignment file is not in the correct format. Can you share the right file format with me? Or some scripts adjusted for fasta file to compute the %id values?

Thank you!

MichaelHiller commented 1 year ago

Hi both,

the alignment file is a standard multi fasta file that lists both the sequences of extant species (MUST correspond to terminal nodes in the phylogeny) and the sequences of reconstructed ancestors (MUST correspond to internal = ancestral nodes in the phylogeny). In other words, the names in the tree (with labeling ancestors) and fasta headers must match.

Hope that helps, otherwise pls share the alignment and tree.

Best Michael