veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
200 stars 68 forks source link

results isuue and running error #1668

Closed fatima-akhtar113 closed 4 months ago

fatima-akhtar113 commented 7 months ago

i cannot run my orthologue file in meme there are 11 sequences it is running fine in fel and slac also can we say gene is under positive selection if there is selection on one codon or two how we interpret datamonkey results

spond commented 7 months ago

Dear @fatima-akhtar113,

  1. I am afraid I can't help you unless you provide more information about the MEME analysis. If you ran in in Datamonkey, please include the URL for the results page.
  2. No, you cannot conclude that a gene is under selection if one or two sites are under selection. See https://academic.oup.com/mbe/article/32/5/1365/1134918. Use BUSTED to look for gene-level selection. image.

Best, Sergei

fatima-akhtar113 commented 7 months ago

thankyou for replying here is the url attached https://www.datamonkey.org/meme/656497ed1fdac30a835a1cd3 can fel and slac be used for gene level selection ? yeah thankyou for suggesting busted what i can interpret from my fel and slac results thou can they be significant? Also is meme a good option for analyzing gene level selection?

On Mon, Nov 27, 2023 at 6:03 PM Sergei Pond @.***> wrote:

Dear @fatima-akhtar113 https://github.com/fatima-akhtar113,

  1. I am afraid I can't help you unless you provide more information about the MEME analysis. If you ran in in Datamonkey, please include the URL for the results page.
  2. No, you cannot conclude that a gene is under selection if one or two sites are under selection. See https://academic.oup.com/mbe/article/32/5/1365/1134918. Use BUSTED to look for gene-level selection. image.png (view on web) https://github.com/veg/hyphy/assets/1018513/1cf455e8-d1a6-40ec-9c3b-be1628aa9329 .

Best, Sergei

— Reply to this email directly, view it on GitHub https://github.com/veg/hyphy/issues/1668#issuecomment-1827794723, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDABE4ZBLR4UEKXX62QE5T3YGSFSHAVCNFSM6AAAAAA73OHYSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRXG44TINZSGM . You are receiving this because you were mentioned.Message ID: @.***>

fatima-akhtar113 commented 7 months ago

i skipped picture now i know meme is not a good option please guide me on rest of the two

On Mon, Nov 27, 2023 at 5:37 PM fatima khan @.***> wrote:

thankyou for replying here is the url attached https://www.datamonkey.org/meme/656497ed1fdac30a835a1cd3 can fel and slac be used for gene level selection ? yeah thankyou for suggesting busted what i can interpret from my fel and slac results thou can they be significant? Also is meme a good option for analyzing gene level selection?

On Mon, Nov 27, 2023 at 6:03 PM Sergei Pond @.***> wrote:

Dear @fatima-akhtar113 https://github.com/fatima-akhtar113,

  1. I am afraid I can't help you unless you provide more information about the MEME analysis. If you ran in in Datamonkey, please include the URL for the results page.
  2. No, you cannot conclude that a gene is under selection if one or two sites are under selection. See https://academic.oup.com/mbe/article/32/5/1365/1134918. Use BUSTED to look for gene-level selection. image.png (view on web) https://github.com/veg/hyphy/assets/1018513/1cf455e8-d1a6-40ec-9c3b-be1628aa9329 .

Best, Sergei

— Reply to this email directly, view it on GitHub https://github.com/veg/hyphy/issues/1668#issuecomment-1827794723, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDABE4ZBLR4UEKXX62QE5T3YGSFSHAVCNFSM6AAAAAA73OHYSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRXG44TINZSGM . You are receiving this because you were mentioned.Message ID: @.***>

spond commented 7 months ago

Dear @fatima-akhtar113,

Like I said previosuly, if the goal is to identify selection at the level of a gene you should use BUSTED. However, the sequences you submitted to Datamonkey have not been properly aligned. Datamonkey will "pad" sequences of unequal lengths with ? at the end and this is what happened here (https://www.datamonkey.org/meme/656497ed1fdac30a835a1cd3/fasta)

Datamonkey requires codon-aware multiple sequence alignments. If you are not familiar with how to obtain those, you may want to take a look elsewhere, e.g. https://github.com/veg/hyphy-analyses/blob/master/codon-msa/README.md and https://github.com/veg/hyphy/issues/1477

I attach an aligned version of your data (using the codon-msa workflow I linked to above).

If you run it through BUSTED in HyPhy, like so

 hyphy busted --alignment /Users/sergei/Desktop/seqs.msa --tree neighbor-joining --starting-points 5 

You will get a significant result for positive selection (p ~ 0), but a very odd looking ω distribution

image

A dN/dS of 3000 is indicative of some pathologies with the data / model. For example here's one site which shows several multi-nucleotide substitutions

image

If you then run BUSTED with support for multiple hits (see https://academic.oup.com/mbe/article/40/7/msad150/7217158)

hyphy busted --alignment /Users/sergei/Desktop/seqs.msa.gz --starting-points 5 --tree neighbor-joining --multiple-hits Double+Triple 

a very odd result is obtained

### Partition-level rates for multiple-hit substitutions
* rate at which 2 nucleotides are changed instantly within a single codon :   1.9304
* Corresponding fraction of substitutions : 45.463%
* rate at which 3 nucleotides are changed instantly within a single codon :   1.9649
* Corresponding fraction of substitutions :  5.696%

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.967     |    0.000    |       Not supported by data       |
|        Negative selection         |     0.999     |    0.000    |       Not supported by data       |
|      Diversifying selection       |    244.639    |   100.000   |                                   |

Having more than 50% of the substitutions occur due to multiple hits is very odd.

May I ask where these sequences come from? (unless they are simulated).

Best, Sergei

seqs.msa.gz

fatima-akhtar113 commented 7 months ago

I took the coding sequence from blast orthologus of my protein then converted them into DNA sequence using reverse translate. What do I do  to gnt results, I can align my sequences using ugene also do I have to remove gaps. Sent from my Huawei Mobile-------- Original Message --------Subject: Re: [veg/hyphy] results isuue and running error (Issue #1668)From: Sergei Pond To: veg/hyphy CC: fatima-akhtar113 ,Mention Dear @fatima-akhtar113, Like I said previosuly, if the goal is to identify selection at the level of a gene you should use BUSTED. However, the sequences you submitted to Datamonkey have not been properly aligned. Datamonkey will "pad" sequences of unequal lengths with ? at the end and this is what happened here (https://www.datamonkey.org/meme/656497ed1fdac30a835a1cd3/fasta) Datamonkey requires codon-aware multiple sequence alignments. If you are not familiar with how to obtain those, you may want to take a look elsewhere, e.g. https://github.com/veg/hyphy-analyses/blob/master/codon-msa/README.md and #1477 I attach an aligned version of your data (using the codon-msa workflow I linked to above). If you run it through BUSTED in HyPhy, like so hyphy busted --alignment /Users/sergei/Desktop/seqs.msa --tree neighbor-joining --starting-points 5

You will get a significant result for positive selection (p ~ 0), but a very odd looking ω distribution image.png (view on web) A dN/dS of 2000 is indicative of some pathologies with the data / model. For example here's one site which shows several multi-nucleotide substitutions image.png (view on web) If you then run BUSTED with support for multiple hits (see https://academic.oup.com/mbe/article/40/7/msad150/7217158) hyphy busted --alignment /Users/sergei/Desktop/seqs.msa.gz --starting-points 5 --tree neighbor-joining --multiple-hits Double+Triple

a very odd result is obtained

Partition-level rates for multiple-hit substitutions

Selection mode dN/dS Proportion, % Notes
Negative selection 0.967 0.000 Not supported by data
Negative selection 0.999 0.000 Not supported by data
Diversifying selection 244.639 100.000

Having more than 50% of the substitutions occur due to multiple hits is very odd. May I ask where these sequences come from? (unless they are simulated). Best, Sergei seqs.msa.gz

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

fatima-akhtar113 commented 7 months ago

On Tue, Nov 28, 2023 at 7:04 AM fatima khan @.***> wrote:

I took the coding sequence from blast orthologus of my protein then converted them into DNA sequence using reverse translate. What do I do to gnt results, I can align my sequences using ugene also do I have to remove gaps.

Sent from my Huawei Mobile

-------- Original Message -------- Subject: Re: [veg/hyphy] results isuue and running error (Issue #1668) From: Sergei Pond To: veg/hyphy CC: fatima-akhtar113 ,Mention

Dear @fatima-akhtar113 https://github.com/fatima-akhtar113,

Like I said previosuly, if the goal is to identify selection at the level of a gene you should use BUSTED. However, the sequences you submitted to Datamonkey have not been properly aligned. Datamonkey will "pad" sequences of unequal lengths with ? at the end and this is what happened here ( https://www.datamonkey.org/meme/656497ed1fdac30a835a1cd3/fasta)

Datamonkey requires codon-aware multiple sequence alignments. If you are not familiar with how to obtain those, you may want to take a look elsewhere, e.g. https://github.com/veg/hyphy-analyses/blob/master/codon-msa/README.md and

1477 https://github.com/veg/hyphy/issues/1477

I attach an aligned version of your data (using the codon-msa workflow I linked to above).

If you run it through BUSTED in HyPhy, like so

hyphy busted --alignment /Users/sergei/Desktop/seqs.msa --tree neighbor-joining --starting-points 5

You will get a significant result for positive selection (p ~ 0), but a very odd looking ω distribution image.png (view on web) https://github.com/veg/hyphy/assets/1018513/d0612854-bdd5-49ff-a442-1c3d38c32cba

A dN/dS of 2000 is indicative of some pathologies with the data / model. For example here's one site which shows several multi-nucleotide substitutions image.png (view on web) https://github.com/veg/hyphy/assets/1018513/a09a972c-9b0d-4c28-a3ee-dacdc8a8007b

If you then run BUSTED with support for multiple hits (see https://academic.oup.com/mbe/article/40/7/msad150/7217158)

hyphy busted --alignment /Users/sergei/Desktop/seqs.msa.gz --starting-points 5 --tree neighbor-joining --multiple-hits Double+Triple

a very odd result is obtained

Partition-level rates for multiple-hit substitutions

  • rate at which 2 nucleotides are changed instantly within a single codon : 1.9304
  • Corresponding fraction of substitutions : 45.463%
  • rate at which 3 nucleotides are changed instantly within a single codon : 1.9649
  • Corresponding fraction of substitutions : 5.696%
Selection mode dN/dS Proportion, % Notes
Negative selection 0.967 0.000 Not supported by data
Negative selection 0.999 0.000 Not supported by data
Diversifying selection 244.639 100.000

Having more than 50% of the substitutions occur due to multiple hits is very odd.

May I ask where these sequences come from? (unless they are simulated).

Best, Sergei

seqs.msa.gz https://github.com/veg/hyphy/files/13481434/seqs.msa.gz

— Reply to this email directly, view it on GitHub https://github.com/veg/hyphy/issues/1668#issuecomment-1828778928, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDABE46A2CB2P5PARI5YAMTYGUMYBAVCNFSM6AAAAAA73OHYSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRYG43TQOJSHA . You are receiving this because you were mentioned.Message ID: @.***>

homosapien MDEPPFSEAALEQALGEPCDLDAALLTDIEGEVGAGRGRANGLDAPRAGADRGAMDCTFE DMLQLINNQDSDFPGLFDPPYAGSGAGGTDPASPDTSSPGSLSPPPATLSSSLEAFLSGP QAAPSPLSPPQPAPTPLKMYPSMPAFSPGPGIKEESVPLSILQTPTPQPLPGALLPQSFP APAPPQFSSTPVLGYPSPPGGFSTGSPPGNTQQPLPGLPLASPPGVPPVSLHTQVQSVVP QQLLTVTAAPTAAPVTTTVTSQIQQVPVLLQPHFIKADSLLLTAMKTDGATVKAAGLSPL VSGTTVQTGPLPTLVSGGTILATVPLVVDAEKLPINRLAAGSKAPASAQSRGEKRTAHNA IEKRYRSSINDKIIELKDLVVGTEAKLNKSAVLRKAIDYIRFLQHSNQKLKQENLSLRTA VHKSKSLKDLVSACGSGGNTDVLMEGVKTEVEDTLTPPPSDAGSPFQSSPLSLGSRGSGS GGSGSDSEPDSPVFEDSKAKPEQRPSLHSRGMLDRSRLALCTLVFLCLSCNPLASLLGAR GLPSPSDTTSVYHSPGRNVLGTESRDGPGWAQWLLPPVVWLLNGLLVLVSLVLLFVYGEP VTRPHSGPAVYFWRHRKQADLDLARGDFAQAAQQLWLALRALGRPLPTSHLDLACSLLWN LIRHLLQRLWVGRWLAGRAGGLQQDCALRVDASASARDAALVYHKLHQLHTMGKHTGGHL TATNLALSALNLAECAGDAVSVATLAEIYVAAALRVKTSLPRALHFLTRFFLSSARQACL AQSGSVPPAMQWLCHPVGHRFFVDGDWSVLSTPWESLYSLAGNPVDPLAQVTQLFREHLL ERALNCVTQPNPSPGSADGDKEFSDALGYLQLLNSCSDAAGAPAYSFSISSSMATTTGVD PVAKWWASLTAVVIHWLRRDEEAAERLCPLVEHLPRVLQESERPLPRAALHSFKAARALL GCAKAESGPASLTICEKASGYLQDSLATTPASSSIDKAVQLFLCDLLLVVRTSLWRQQQP PAPAPAAQGTSSRPQASALELRGFQRDLSSLRRLAQSFRPAMRRVFLHEATARLMAGASP TRTHQLLDRSLRRRAGPGGKGGAVAELEPRPTRREHAEALLLASCYLPPGFLSAPGQRVG MLAEAARTLEKLGDRRLLHDCQQMLMRLGGGTTVTSS Pantroglodytes MDEPPFSEAALEQALGEPCDLDAALLTDIEGEVGAGRGRANRLDAPRAGADHGAMDCTFEDMLQLINNQD SDFPGLFDPPYAGSGAGGTDPASPDTSSPGSLSPPPATLSSSLEAFLSGPQAAPSPLSPPQPAPTPLKMY PSVPTFSPGPGIKEESVPLSILQTPTPQPLPGALLPQSFPAPAPPQFSSTPVLGYPSPPGGFSTGSPPGS TQQPLPGLPLASPPGVPPISLHTQVQSVVPQQLLTVTAAPTAAPVTTTVTSQIQQVPVLLQPHFIKADSL LLTAMKTDGATVKAAGLSPLVSGTTVQTGPLPTLVSGGTILATVPLVVDAEKLPINRLAAGSKAPASAQS RGEKRTAHNAIEKRYRSSINDKIIELKDLVVGTEAKLNKSAVLRKAIDYIRFLQHSNQKLKQENLSLRTA VHKSKSLKDLVSACGSGGNTDVLMEGVKTEVEDTLTPPPSDAGSPFQSSPLSLGSRGSGSGGSGSDSEPD SPVFEDSKAKPEQRPSLHSRGMLDRSRLALCTLVFLCLSCNPLASLLGARGLPSPSDTTSIYHSPGRNVL GTESRDGPGWAQWLLPPVVWLLNGLLVLVSLVLLFVYGEPVTRPHSGPAVYFWRHRKQADLDLARGDFAQ AAQQLWLALRALGRPLPTSHLDLACSLLWNLIRHLLQRLWVGRWLAGRAGGLQQDCALRVDARASARDAA LVYHKLHQLHTMGKHTGGHLTATNLALSALNLAECAGDAVSVATLAEIYVAAALRVKTSLPRALHFLTRF FLSSARQACLAQSGSVPPAMQWLCHPVGHRFFVDGDWSVLSTPWESLYSLAGNPVDPLAQVTQLFREHLL ERALNCVTQPNPSPGSADGDKEFSDALGYLQLLNSCSDAAGAPACSFSISSSMATTTGVDPVAKWWASLT AVVIHWLRRDEEAAERLCPLVEHLPRVLQESERPLPRAALHSFKAARALLGCAKAESGPASLTICEKASG YLQDSLATTPASSSIDKAVQLFLCDLLLVVRTSLWRQQQPPAPAPAAQGTSSRPQASALELRGFQRDLSS LRRLAQSFRPAMRRVFLHEATARLMAGASPTRTHQLLDRSLRRRAGPGGKGGAVAELEPRPTRREHAEAL LLASCYLPPGFLSAPGQRVGMLAEAARTLEKLGDRRLLHDCQQMLMRLGGGTTVTSS Panpaniscus MDEPPFSEAALEQALGEPCDLDAALLTDIEGEVGAGRGRANRLDAPRAGADHGAMDCTFEDMLQLINNQD SDFPGLFDPPYAGSGAGGTDPASPDTSSPGSLSPPPATLSSSLEAFLSGPQAAPSPLSPPQPAPTPLKMY PSVPTFSPGPGIKEESVPLSILQTPTPQPLPGALLPQSFPAPAPPQFSSTPVLGYPSPPGGFSTGSPPGS TQQLLPGLPLASPPGVPPISLHTQVQSVVPQQLLTVTAAPTAAPVTTTVTSQIQQVPVLLQPHFIKADSL LLTAMKTDGATVKAAGLSPLVSGTTVQTGPLPTLVSGGTILATVPLVVDAEKLPINRLAAGSKAPASAQS RGEKRTAHNAIEKRYRSSINDKIIELKDLVVGTEAKLNKSAVLRKAIDYIRFLQHSNQKLKQENLSLRTA VHKSKSLKDLVSACGSGGNTDVLMEGVKTEVEDTLTPPPSDAGSPFQSSPLSLGSRGSGSGGSGSDSEPD SPVFEDSKAKPEQRPSLHSRGMLDRSRLALCTLVFLCLSCNPLASLLGARGLPSPSDTTSIYHSPGRNVL GTESRDGPGWAQWLLPPVVWLLNGLLVLVSLVLLFVYGEPVTRPHSGPAVYFWRHRKQADLDLARGDFDQ AAQQLWLALRALGRPLPTSHLDLACSLLWNLIRHLLQRLWVGRWLAGRAGGLQQDCALRVDARASARDAA LVYHKLHQLHTMGKHTGGHLTATNLALSALNLAECAGDAVSVATLAEIYVAAALRVKTSLPRALHFLTRF FLSSARQACLAQSGSVPPAMQWLCHPVGHRFFVDGDWSVLSTPWESLYSLAGNPVDPLAQVTQLFREHLL ERALNCVTQPNPSPGSADGDKEFSDALGYLQLLNSCSDAAGAPACSFSISSSMATTTGVDPVAKWWASLT AVVIHWLRRDEEAAERLCPLVEHLPRVLQESERPLPRAALHSFKAARALLGCAKAESGPASLTICEKASG YLQDSLATTPASSSIDKAVQLFLCDLLLVVRTSLWRQQQPPAPAPAAQGTSSRPQASALELRGFQRDLSS LRRLAQSFRPAMRRVFLHEATARLMAGASPTRTHQLLDRSLRRRAGPGGKGGAVAELEPRPTRREHAEAL LLASCYLPPGFLSAPGQRVGMLAEAARTLEKLGDRRLLHDCQQMLMRLGGGTTVTSS Pongoabelii MDEPPFSEAALEQALGEPCDLDLALLTDIEGEVGAGRGRANRLDAPRAGADRGAMDCTFEDMLQLINNQD SDFPGLFDPPYAGSGAGGTDPASPDTSSPGSLSPPPATLSSSLEAFLSGPKAAPSPLSPPQPAPTPLKMY PSMPAFSPGPGIKEESVPLSILQTPTPQPLPGALLPQSFPAPAPPQFSSTPVLGYPSPPGGFSTGSPPGS TQQPLPGLPLASPPGVPPVSLHTQAQSVVPQQLLTVTAAPTAAPVTTTVTSQIQQVPVLLQPHFIKADSL LLTAVKTDGATVKAAGLSPLVSGTTVQTGPLPTLVSGGTILATVPLVVDADKLPINRLAAGSKASGSAQS RGEKRTAHNAIEKRYRSSINDKIIELKDLVVGTEAKLNKSAVLRKAIDYIRFLQHSNQKLKQENLSLRTA VHKSKSLKDLVSACGSGGNTDVLMEGVKTEVEDTLTPPPSDAGSPFQSSPLSLGSRGSGSGGSGSDSEPD SPAFEDSKAKPEQRPSSHSRGMLDRSRLALCTLVFLCLSCNPLASLLGARGLPSPSDTTSVYHSPGRNVL GTESRDGPGWAQWLLPPVVWLLNGLLVLVSLVLLFVYGEPVTRPHSGPAVYFWRHRKQADLDLARGDFAQ AAQQLWLALRALGRPLPTSHLDLACSLLWNLIRHLLQRLWVGRWLAGRAGGLQQDCALRVDACASARDAA LVYHKLHQLHTMGKYTGGHLTATNLALSALNLAECAGDAVSVATLAEIYVAAALRVKTSLPRALHFLTRF FLSSARQACLAQSGSVPPAMQWLCHPVGHRFFVDGDWAVLSTPRESLYSLAGNPVDPLAQVTQLFREHLL ERALNCVTQPSPSPGSADGDKEFSDALGYLQLLNSCSDAAGAPACSFSISSSMATTTGVDPVAKWWASLT AVVIHWLRRDEEAAERLCPLVEHLPRVLQESERPLPRAALHSFKAAWALLGCAKAESGPASLTICEKASG YLQDSLATTPASSSIDKAVQLFLCDLLLVVRTSLWRQQQPPAPAPAAQGTSSRPHASALELRGFQRDLSS LRRLAQSFRPAMRRVFLHEATARLMAGASPTRTHQLLDRSLRRRAGPGGKGGAVAELEPRPTRREHAEAL LLASCYLPPGFLSAPGQRVGMLAEAARTLEKLGDRRLLHDCQQMLMRLGGGTTVTSS Macacafascicularis MDEPPFSEAALEQALGGPCDLDAALLTDIEGEVGAGRGRASRLDAPRAGADRGAMDCTFEDMLQLINNQD SDFPGLFDPPYAGSGAGGTDPASPDTSSPGSLSPPPTTLSSSLEDFLSGPKAAPSPLSPPQPAPTPLKMY PSVPTFSPGPGIKEESVPLSILQTPTPQPLPGALLPQSFPAPAPPQFSSTPVLGYPSPPGGFSTGSPPGS TQQPLPGLPLASPPGVPPVSLHTQVQSVAPQRLLTVTAAPTAAPATTTVTSQIQQVPVLLQPHFIKADSL LLTAMKTDGTTVKAAGLSPLVSGTTVQTGPLPTLVSGGTILATVPLVVDADKLPINRLAAGSKAPGSAQS RGEKRTAHNAIEKRYRSSINDKIIELKDLVVGTEAKLNKSAVLRKAIDYIRFLQHSNQKLKQENLSLRTA VHKSKSLKDLVSACGSEGNTDVLMEGVKTEVEDTLTPPPSDAGSPFQSSPLSLGSRGSGSGGSGSDSEPD SPVFEDSKAKPEQRPSPHSRGMLDRSRLALCTLVFLCLSCNPLASLLGARGLPGPSDITSVYHSPGRNVL GTESRDGPGWAQWLLPPVVWLLNGLLVLVSLVLLFVYGEPVTRPHSGPAVHFWRHRKQADLDLARGDFAQ AAQQLWLALRALGRPLPTSHLDLACSLLWNLIRHLLQRLWVGRWLAGRAGGLQRDCSLRVDARASARDAA LVYHKLHQLHTMGKYTGGHLTATNLALSALNLAECAGDAVSVATLAEIYVAAALRVKTSLPRTLHFLTRF FLSSARQACLAQSGSVPPAMQWLCHPVGHRFFVDGDWAVLSTPRETLYSLAGNPVDPLAQVTQLFREHLL ERALNCVTQPNPSPGSADGDKEFSDALGYLQLLNSCSDAAGAPACSFSISSSMATTTGIDPVAKWWASLT AVVIHWLRRDEEAAERLCPLVEHLPRVLQESERPLPRAALHSFKAARALLGCAKAESGPASLTICEKASG YLQDSLATTPASSSIDKAVQLFLCDLLLVVRTSLWRQQQPPAPAPAAQGTSSGPQASALELRGFQRDLSS LRRLAQSFRPAMRRVFLHEATARLMAGASPTRTHQLLDRSLRRRAGPGGKGGAVAELEPRPTRREHAEAL LLASCYLPPGFLSAPGQRVGMLAEAARTLEKLGDRRLLHDCQQMLMRLGGGTTVTSS Nomascusleucogenys MDEPPFSEAALEQALGEPCDLDAALLTDIEGARRGAGRGRANRLDAPRAGADRGAMDCTFEDMLQLINNQ DSDFPGLFDPPYAGSGAGGTDPASPDTSSPGSLSPPPATLSSSLEAFLSGPKAAPSPLSPPQPAPTPLKM YPSVPAFSPGPGIKEESVPLSILQTPTPHPLPGALLPQSFPAPAPPQFSSTPVLGYPSPPEGFSTGSPPG STQQPLPGLPLASPPGVPPVSLHTQVQSVVPQQLLTVTAAPTAAPVTTTVTSQIQQVPVLLQPHFIKADS LLLTAMKTDGATVKAAGLSPLVSGTTVQTGPLPTLVSGGTILATVPLVVDADKLPINRLAAGSKAPGSAQ SRGEKRTAHNAIEKRYRSSINDKIIELKDLVVGTEAKLNKSAVLRKAIDYIRFLQHSNQKLKQENLSLRT AVHKSKSLKDLVSACGSGGNTDVLMEGVKTEVEDTLTPPPSDAGSPFQSSPLSLGSRGSGSGGSGSDLEP DSPVFEDSKAKPEQWPSPHSRGMLDRSRLALCTLVFLCLSCNPLASLLGARGLPSPSDTTSVYHSPGRNV LGTESRDGPGWAQWLLPPVVWLLNGLLVLVSLVLLFVYGEPVTRPHSGPAVYFWRHRKQADLDLARGDFA QAAQQLWLALRALGRPLPTSHLDLACSLLWNLIRHLLQRLWVGRWLAGRAGGLQQDCALRVDARASARDA ALVYHKLHQLHTMGKYTGGHLTATNLALSALNLAECAGDAVSVATLAEIYVAAALRVKTSLPRALHFLTR FFLSSARQACLAQSGSVPPAMQWLCHPVGHRFFVDGDWAVLSTPRESLYSLAGNPVDPLAQVTQLFREHL LERALNCVTQPNPSPGSADGDKEFSDALGYLQLLNSCSDAAGTPACSFSISSSMATTTGVDPVAKWWASL TAVVIHWLRRDEEAAERLCPLLEHLPRVLQESERPLPRAALHSFKAARALLGCAKAESGPASLTICEKAS GYLQDSLATTPTSSSIDKAVQLFLCDLLLVVRTSLWQQQQPLAPAPASQSASSRPQASALELRGFQRDLS SLRRLAQSFRPAMRRVFLHEATARLMAGASPTRTHQLLDRSLRRRAGPGGKGGAVAELEPRPTRREHAEA LLLASCYLPPGFLSAPGQRVGMLAEAARTLEKLGDRWLLHDCQQMLMRLGGGTTVTS Chlorocebussabaeus MDEPPFSKAALEQALGGPCDLDAALLTDIEGEVGAGRGRASRLDAPRAGADRGAMDCTFEDMLQLINNQD SDFPGLFDPPYAGSGAGGTDPASPDTSSPGSLSPPPTTLSSSLEDFLSGPKAAPSPLSPPQPAPTPLKMY PSVPTFSPGPGIKEESVPLSILQTPTPQPLPGALLPQSFPAPAPPQFSSTPVLGYPSPPGAFSTGSPPGS TQQPLPGLPLASPPGVPPVSLHTQVQSVAPQRLLTVTAAPTAAPATTTVTSQIQQVPVLLQPHFIKADSL LLTAMKTDGATVKAAGLSPLVSGTTVQTGPLPTLVSGGTILATVPLVVDADKLPINRLAAGSKAPGSAQS RGEKRTAHNAIEKRYRSSINDKIIELKDLVVGTEAKLNKSAVLRKAIDYIRFLQHSNQKLKQENLSLRTA VHKSKSLKDLVSACGSEGNTDVLMEGVKTEVEDTLTPPPSDAGSPFQSSPLSLGSRGSGSGGSGSDSEPD SPVFEDSKAKPEQRPSPHSRGMLDRSRLALCTLVFLCLSCNPLASLLGARGLPGPSDITSVYHSPGRNVL GTESRDGPGWAQWLLPPVVWLLNGLLVLVSLVLLFVYGEPVTRPHSGPAVHFWRHRKQADLDLARGDFAQ AAQQLWLALRALGRPLPTSHLDLACSLLWNLIRHLLQRLWVGRWLAGRAGGLQRDCSLRVDARASARDAA LVYHKLHQLHTMGKYTGGHLTATNLALSALNLAECAGDAVSVATLAEIYVAAALRVKTSLPRTLHFLTRF FLSSARQACLAQSGSVPPAMQWLCHPVGHRFFVDGDWAVLSTPRETLYSLAGNPVDPLAQVTQLFREHLL ERALNCVTQPNPSPGSADGDKEFSDALGYLQLLNSCSDAAGAPACSFSISSSMATTTGVDPVAKWWASLT AVVIHWLRRDEEAAERLCPLVEHLPRVLQESERPLPRAALHSFKAARALLGCAKAESGPASLTICEKASG YLQDSLTTTPASSSIDKAVQLFLCDLLLVVRTSLWRQQQPPAPAPAAQGTSSGPQASALELRGFQRDLSS LRRLAQSFRPAMRRVFLHEATARLMAGASPTRTHQLLDRSLRRRAGPSGKGGAVAELEPRPTRREHAEAL LLASCYLPPGFLSAPGQRVGMLAEAARTLEKLGDRRLLHDCQQMLMRLGGGTTVTSS Cercocebusatys MDEPPFSEAALEQALGGPCDLDAALLTDIEGEVGARRGRASRLDAPRAGADRGAMDCTFEDMLQLINNQD SDFPGLFDPPYAGSGAGGTDPASPDTSSPGSLSPPPTTLSSSLEDFLSGPKAAPSPLSPPQPAPTPLKMY PSVPTFSPGPGIKEESVPLSILQTPTPQPLPGALLPQSFPAPAPPQFSSTPVLGYPSPPGGFSTGSPPGS TQQPLPGLPLASPPGVPPVSLHTQVQSVAAQRLLTVTAAPTAAPATTTVTSQIQQVPVLLQPHFIKADSL LLTAMKTDGTTVKAAGLSPLVSGTTVQTGPLPTLVSGGTILATVPLVVDADKLPINRLAAGSKAQGSAQS RGEKRTAHNAIEKRYRSSINDKIIELKDLVVGTEAKLNKSAVLRKAIDYIRFLQHSNQKLKQENLSLRTA VHKSKSLKDLVSACGSEGNTDVLMEGVKTEVEDTLTPPPSDAGSPFQSSPLSLGSRGSGSGGSGSDSEPD SPVFEDSKAKPEQRPSPHSRGMLDRSRLALCTLVFLCLSCNPLASLLGARGLPGPSDITSVYHSPGRNVL GTESRDGPGWAQWLLPPVVWLLNGLLVLVSLVLLFVYGEPVTRPHSGPAVHFWRHRKQADLDLARGDFAQ AAQQLWLALRALGRPLPTSHLDLACSLLWNLIRHLLQRLWVGRWLAGRAGGLQRDCSLRVDARASARDAA LVYHKLHQLHTMGKYTGGHLTATNLALSALNLAECAGDAVSVATLAEIYVAAALRVKTSLPRTLHFLTRF FLSSARQACLAQSGSVPPAMQWLCHPVGHRFFVDGDWAVLSTPRETLYSLAGNPVDPLAQVTQLFREHLL ERALNCVTQPNPSPGSADGDKEFSDALGYLQLLNSCSDAAGAPACSFSISSSMATTTGIDPVAKWWASLT AVVIHWLRRDEEAAERLCPLVEHLPRVLQESERPLPRAALHSFKAARALLGCAKAESGPASLTICEKASG YLQDSLATTPASSSIDKAVQLFLCDLLLVVRTSLWRQQQPPAPAPAAQGTSSGPQASALELRGFQRDLSS LRRLAQSFRPAMRRVFLHEATARLMAGASPTRTHQLLDRSLRRRAGPGGKGGAVAELEPRPTRREHAEAL LLASCYLPPGFLSAPGQRVGMLAEAARTLEKLGDRRLLHDCQQMLMRLGGGTTVTSS Gorillagorillagorilla MDEPPFSEAALEQALGEPCDLDAALLTDIEDMLQLINNQDSDFPGLFDPPYAGSGAGGTDPASPDTSSPG SLSPPPATLSSSLEAFLSGPKAAPSPLSPPQPAPTPLKMYPSVPAFSPGPGIKEESVPLSILQTPTPQPL PGALLPQSFPAPAPPQFSSTPVLGYPSPPGGFSTGSPPGSTQQPLPGLPLASPPGVPPVSLHTQVQSVVP QQLLTVTAAPTAAPVTTTVTSQIQQVLLQPHFIKADSLLLTAMKTDGATVKAAGLSPLVSGTTVQTGPLP TLVSGGTILATVPLVVDAEKLPINRLAAGSKAPASAQSRGEKRTAHNAIEKRYRSSINDKIIELKDLVVG TEAKLNKSAVLRKAIDYIRFLQHSNQKLKQENLSLRTAVHKSKSLKDLVSACGSGGNTDMLMEGVKTEVE DTLTPPASDAGSPFQSSPLSLGSRGSGSGGSGSDSEPDSPVFEDSKAKPEQRPSLHSRGMLDRSRLALCT LVFLCLSCNPLASLLGARGLPSPSDTTSVYHSPGRNVLGTESRDGPGWAQWLLPPVVWLLNGLLVLVSLV LLFVYGEPVTRPHSGPAVYFWRHRKQADLDLARGDFAQAAQQLWLALRALGRPLPTSHLDLACSLLWNLI RHLLQRLWVGRWLAGRAGGLQQDCALRVDARASARDAALVYHKLHQLHTMGKHTGGHLTATNLALSALNL AECAGDAVSVATLAEIYVAAALRVKTSLPRALHFLTRFFLSSARQACLAQSGSVPPAMQWLCHPVGHRFF VDGDWSVLSTPWESLYSLAGNPVDPLAQVTQLFREHLLERALNCVTQPNPSPGSADGDKEFSDALGYLQL LNSCSDAAGAPACSFSISSSMATTTGVDPVAKWWASLTAVVIHWLRRDEEAAERLCPLVEHLPRVLQESE RPLPRAALHSFKAARALLGCAKAESGPASLTICEKASGYLQDSLATTPASSSIDKAVQLFLCDLLLVVRT SLWRQQQPPAPAPAAQGTSSRPQASALELRGFQRDLSSLRRLAQSFRPAMRRVFLHEATARLMAGASPTR THQLLDRSLRRRAGPGGKGGTVAELEPRPTRREHAEALLLASCYLPPGFLSAPGQRVGMLAEAARTLEKL GDRRLLHDCQQMLMRLGGGTTVTSS Rhinopithecusroxellana MDESPFSEAALEQALGGPCDLDAALLTDIEDMLQLINNQDSDFPGLFDPPYAGSGAGGTDPASPDTSSPG SLSPPPATLSSSLEDFLSGPKAAPSPLSPPQPAPTPLKMYPSVPTFSPGPGIKEESVPLSILQTPTPQPL PGALLPQSFPAPAPTQFSSTPVLGYPSPPGGFSTGSPPGSTQQPLPGLPLASPPGVPPVSLHTQVQSVAP QRLLTVTAAPTAAPATTTVTSQIQQVPVLLQPHFIKADSLLLTAMKTDGATVKAAGLSPLVSGTAVQTGP LPTLVSGGTILATVPLVVDADKLPINRLAAGSKAPVSAQSRGEKRTAHNAIEKRYRSSINDKIIELKDLV VGTEAKLNKSAVLRKAIDYIRFLQHSNQKLKQENLSLRTAVHKSKSLKDLVSACGSEGNTDVLMEGVKTE VEDTLTPPPSDAGSPFQSSPLSLGSRGSGSGGSGSDSEPDSPVFEDSKAKPEQRPSPHSRGMLDRSRLAL CTLVFLCLSCNPLASLLGARGLPGPSDITSVYHSPGRNVLGTESRDGPGWAQWLLPPVVWLLNGLLVLVS LVLLFVYGEPVTRPHSGPAVHFWRHRKQADLDLARGDFAQAAQQLWLALRALGRPLPTSHLDLACSLLWN LIRHLLQRLWVGRWLAGRAGGLQRDCSLRVDARASARDAALVYHKLHQLHTMGKYTGGHLTATNLALSAL NLAECAGDAVSVATLAEIYVAAALRVKTSLPRTLHFLTRFFLSSARQACLAQSGSVPPAMQWLCHPVGHR FFVDGDWVVLSTPRETLYSLAGNPVDPLAQVTQLFREHLLERALNCVTQPNPSPGSADGDKEFSDALGYL QLLNSCSDAAGAPACSFSISSSMATTTGVDPVAKWWASLTAVVIHWLRRDEEAAERLCPLVEHLPRVLQE SERPLPRAALHSFKAARALLGCAKAESGPASLTICEKASGYLQDSLATTPASSSIDKAVQLFLCDLLLVV RTSLWRQQQPPAPAPAAQGASSGPQASALELRGFQRDLSSLRRLAQSFRPAMRRVFLHEATARLMAGASP TRTHQLLDRSLRRRAGPGGKGGTVAELEPRPTRREHAEALLLASCYLPPGFLSAPGQRVGMLAEAARTLE KLGDRRLLHDCQQMLMRLGGGTTVTSS Aotusnancymaae MDELSFSEAVLEQALSEPCDLDAALLTDIEGEVGAGRGRASRLDALWAGADRGAMDCTFEDMLQLINNQD SDFPGLFDPPYAGGGAGGTDPASPDTSSPASLSPPPATLSSSLEGFLSGPEAAPSPLSPPQPAPAPLKMY PPLPTFSPGPGIKEESVPLSILQTPTPQPLPGALLPQSFPAPAPPQFSSTPVMGYASPAGGFSTGSPPAS TQQPLPGLPLASPPGVPPVSLHTQVQSAAPQQLLTVTAAPTAAPATTTVNSQIQQVPVLLQPHFIKADSL LLTTMKTDGATVKAASLGPLVSGATVQTGPLPTLVSGGTILATVPLVVDADKLPINRLAAGSKAPGSAQS RGEKRTAHNAIEKRYRSSINDKIIELKDLVVGTEAKLNKSAVLRKAIDYIRFLQHSNQKLKQENLSLRTA VHKSKSLKDLVSACGGGGNTDVLMEGVKTEVEDTLTPPPSDAGSPFQSSPLSLGSRGSGSGGSGSDSEPD SPVFEDSQAKPEQRPTAHSGGMPDRSRLALCTLVFLCLSCNPLASLLGARGLPSPSETTSIYHSPGRNVL GTESRDGPGWAQWLLPPVVWLLNGLLVLVSLVLLFVYGEPVTRPHSGPAVHFWRHRKQADLDLARGDFAQ AAQQLWLALRALGRPLPTSHLDLACSLLWNLIRHLLQRLWVGRWLASRAGGLQRDCALRVDARASARDAA LVYHKLHQLHTMGKYTGGHLTATNLALSALNLAECAGDAVSVATLAEIYVVAALRVKTSLPWALHFLTRF FLSSARQVCLAQSGSVPLAMQWLCHPVGHRFFVDGDWAVLSTPRESPYSLAGNPVDPLAQVTQLFREHLL ERALNCVAQPNPSPGSADGDKEFSDALGYLQLLNSCSDAAGAPTCSFSISSSMATTTGVDPVAKWWASLT AVVIHWLRRDEEAAEQLCPLVEHLPRVLQESERPLPRAALHSFKAARALLGCAKAESGPASLTICEKASG YLQDSLATTPAGSSLDKAVQLFLCDLLLVVRTSLWRQQQLPAPAPAGQGASSGPQASALELRGFQRDLSS LRRLAQSFRPAMRRVFLHEATARLMAGASPARTHQLLDRSLRRRAGPGGKGGVAPAELEPRPTRREHAEA LLLASCYLPPGFLSAPGQRMGMLAEAARTLEKLGDRRLLHDCQQMLMRLGGGTTVTSS

fatima-akhtar113 commented 7 months ago

i have attached protein file

On Tue, Nov 28, 2023 at 8:30 AM fatima khan @.***> wrote:

On Tue, Nov 28, 2023 at 7:04 AM fatima khan @.***> wrote:

I took the coding sequence from blast orthologus of my protein then converted them into DNA sequence using reverse translate. What do I do to gnt results, I can align my sequences using ugene also do I have to remove gaps.

Sent from my Huawei Mobile

-------- Original Message -------- Subject: Re: [veg/hyphy] results isuue and running error (Issue #1668) From: Sergei Pond To: veg/hyphy CC: fatima-akhtar113 ,Mention

Dear @fatima-akhtar113 https://github.com/fatima-akhtar113,

Like I said previosuly, if the goal is to identify selection at the level of a gene you should use BUSTED. However, the sequences you submitted to Datamonkey have not been properly aligned. Datamonkey will "pad" sequences of unequal lengths with ? at the end and this is what happened here ( https://www.datamonkey.org/meme/656497ed1fdac30a835a1cd3/fasta)

Datamonkey requires codon-aware multiple sequence alignments. If you are not familiar with how to obtain those, you may want to take a look elsewhere, e.g. https://github.com/veg/hyphy-analyses/blob/master/codon-msa/README.md and #1477 https://github.com/veg/hyphy/issues/1477

I attach an aligned version of your data (using the codon-msa workflow I linked to above).

If you run it through BUSTED in HyPhy, like so

hyphy busted --alignment /Users/sergei/Desktop/seqs.msa --tree neighbor-joining --starting-points 5

You will get a significant result for positive selection (p ~ 0), but a very odd looking ω distribution image.png (view on web) https://github.com/veg/hyphy/assets/1018513/d0612854-bdd5-49ff-a442-1c3d38c32cba

A dN/dS of 2000 is indicative of some pathologies with the data / model. For example here's one site which shows several multi-nucleotide substitutions image.png (view on web) https://github.com/veg/hyphy/assets/1018513/a09a972c-9b0d-4c28-a3ee-dacdc8a8007b

If you then run BUSTED with support for multiple hits (see https://academic.oup.com/mbe/article/40/7/msad150/7217158)

hyphy busted --alignment /Users/sergei/Desktop/seqs.msa.gz --starting-points 5 --tree neighbor-joining --multiple-hits Double+Triple

a very odd result is obtained

Partition-level rates for multiple-hit substitutions

  • rate at which 2 nucleotides are changed instantly within a single codon : 1.9304
  • Corresponding fraction of substitutions : 45.463%
  • rate at which 3 nucleotides are changed instantly within a single codon : 1.9649
  • Corresponding fraction of substitutions : 5.696%
Selection mode dN/dS Proportion, % Notes
Negative selection 0.967 0.000 Not supported by data
Negative selection 0.999 0.000 Not supported by data
Diversifying selection 244.639 100.000

Having more than 50% of the substitutions occur due to multiple hits is very odd.

May I ask where these sequences come from? (unless they are simulated).

Best, Sergei

seqs.msa.gz https://github.com/veg/hyphy/files/13481434/seqs.msa.gz

— Reply to this email directly, view it on GitHub https://github.com/veg/hyphy/issues/1668#issuecomment-1828778928, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDABE46A2CB2P5PARI5YAMTYGUMYBAVCNFSM6AAAAAA73OHYSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRYG43TQOJSHA . You are receiving this because you were mentioned.Message ID: @.***>

fatima-akhtar113 commented 7 months ago

i have 49 genes i used this alignment file for other gene

On Tue, Nov 28, 2023 at 8:31 AM fatima khan @.***> wrote:

i have attached protein file

On Tue, Nov 28, 2023 at 8:30 AM fatima khan @.***> wrote:

On Tue, Nov 28, 2023 at 7:04 AM fatima khan @.***> wrote:

I took the coding sequence from blast orthologus of my protein then converted them into DNA sequence using reverse translate. What do I do to gnt results, I can align my sequences using ugene also do I have to remove gaps.

Sent from my Huawei Mobile

-------- Original Message -------- Subject: Re: [veg/hyphy] results isuue and running error (Issue #1668) From: Sergei Pond To: veg/hyphy CC: fatima-akhtar113 ,Mention

Dear @fatima-akhtar113 https://github.com/fatima-akhtar113,

Like I said previosuly, if the goal is to identify selection at the level of a gene you should use BUSTED. However, the sequences you submitted to Datamonkey have not been properly aligned. Datamonkey will "pad" sequences of unequal lengths with ? at the end and this is what happened here ( https://www.datamonkey.org/meme/656497ed1fdac30a835a1cd3/fasta)

Datamonkey requires codon-aware multiple sequence alignments. If you are not familiar with how to obtain those, you may want to take a look elsewhere, e.g. https://github.com/veg/hyphy-analyses/blob/master/codon-msa/README.md and #1477 https://github.com/veg/hyphy/issues/1477

I attach an aligned version of your data (using the codon-msa workflow I linked to above).

If you run it through BUSTED in HyPhy, like so

hyphy busted --alignment /Users/sergei/Desktop/seqs.msa --tree neighbor-joining --starting-points 5

You will get a significant result for positive selection (p ~ 0), but a very odd looking ω distribution image.png (view on web) https://github.com/veg/hyphy/assets/1018513/d0612854-bdd5-49ff-a442-1c3d38c32cba

A dN/dS of 2000 is indicative of some pathologies with the data / model. For example here's one site which shows several multi-nucleotide substitutions image.png (view on web) https://github.com/veg/hyphy/assets/1018513/a09a972c-9b0d-4c28-a3ee-dacdc8a8007b

If you then run BUSTED with support for multiple hits (see https://academic.oup.com/mbe/article/40/7/msad150/7217158)

hyphy busted --alignment /Users/sergei/Desktop/seqs.msa.gz --starting-points 5 --tree neighbor-joining --multiple-hits Double+Triple

a very odd result is obtained

Partition-level rates for multiple-hit substitutions

  • rate at which 2 nucleotides are changed instantly within a single codon : 1.9304
  • Corresponding fraction of substitutions : 45.463%
  • rate at which 3 nucleotides are changed instantly within a single codon : 1.9649
  • Corresponding fraction of substitutions : 5.696%
Selection mode dN/dS Proportion, % Notes
Negative selection 0.967 0.000 Not supported by data
Negative selection 0.999 0.000 Not supported by data
Diversifying selection 244.639 100.000

Having more than 50% of the substitutions occur due to multiple hits is very odd.

May I ask where these sequences come from? (unless they are simulated).

Best, Sergei

seqs.msa.gz https://github.com/veg/hyphy/files/13481434/seqs.msa.gz

— Reply to this email directly, view it on GitHub https://github.com/veg/hyphy/issues/1668#issuecomment-1828778928, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDABE46A2CB2P5PARI5YAMTYGUMYBAVCNFSM6AAAAAA73OHYSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRYG43TQOJSHA . You are receiving this because you were mentioned.Message ID: @.***>

fatima-akhtar113 commented 7 months ago

i got these results https://www.datamonkey.org/busted/65656bb91fdac30a835a487a i will attach sequence file of protein translated file in DNA and i have already sent alignment file that i submitted are my results right also just i have to mention that there is no selection how can i interpret results tables in my analysis.

also thank you so much for helping i am doing it for first time i am trying to get my head around all of this

regard, fatima Akhtar.

On Tue, Nov 28, 2023 at 8:37 AM fatima khan @.***> wrote:

i have 49 genes i used this alignment file for other gene

On Tue, Nov 28, 2023 at 8:31 AM fatima khan @.***> wrote:

i have attached protein file

On Tue, Nov 28, 2023 at 8:30 AM fatima khan @.***> wrote:

On Tue, Nov 28, 2023 at 7:04 AM fatima khan @.***> wrote:

I took the coding sequence from blast orthologus of my protein then converted them into DNA sequence using reverse translate. What do I do to gnt results, I can align my sequences using ugene also do I have to remove gaps.

Sent from my Huawei Mobile

-------- Original Message -------- Subject: Re: [veg/hyphy] results isuue and running error (Issue #1668) From: Sergei Pond To: veg/hyphy CC: fatima-akhtar113 ,Mention

Dear @fatima-akhtar113 https://github.com/fatima-akhtar113,

Like I said previosuly, if the goal is to identify selection at the level of a gene you should use BUSTED. However, the sequences you submitted to Datamonkey have not been properly aligned. Datamonkey will "pad" sequences of unequal lengths with ? at the end and this is what happened here ( https://www.datamonkey.org/meme/656497ed1fdac30a835a1cd3/fasta)

Datamonkey requires codon-aware multiple sequence alignments. If you are not familiar with how to obtain those, you may want to take a look elsewhere, e.g. https://github.com/veg/hyphy-analyses/blob/master/codon-msa/README.md and #1477 https://github.com/veg/hyphy/issues/1477

I attach an aligned version of your data (using the codon-msa workflow I linked to above).

If you run it through BUSTED in HyPhy, like so

hyphy busted --alignment /Users/sergei/Desktop/seqs.msa --tree neighbor-joining --starting-points 5

You will get a significant result for positive selection (p ~ 0), but a very odd looking ω distribution image.png (view on web) https://github.com/veg/hyphy/assets/1018513/d0612854-bdd5-49ff-a442-1c3d38c32cba

A dN/dS of 2000 is indicative of some pathologies with the data / model. For example here's one site which shows several multi-nucleotide substitutions image.png (view on web) https://github.com/veg/hyphy/assets/1018513/a09a972c-9b0d-4c28-a3ee-dacdc8a8007b

If you then run BUSTED with support for multiple hits (see https://academic.oup.com/mbe/article/40/7/msad150/7217158)

hyphy busted --alignment /Users/sergei/Desktop/seqs.msa.gz --starting-points 5 --tree neighbor-joining --multiple-hits Double+Triple

a very odd result is obtained

Partition-level rates for multiple-hit substitutions

  • rate at which 2 nucleotides are changed instantly within a single codon : 1.9304
  • Corresponding fraction of substitutions : 45.463%
  • rate at which 3 nucleotides are changed instantly within a single codon : 1.9649
  • Corresponding fraction of substitutions : 5.696%
Selection mode dN/dS Proportion, % Notes
Negative selection 0.967 0.000 Not supported by data
Negative selection 0.999 0.000 Not supported by data
Diversifying selection 244.639 100.000

Having more than 50% of the substitutions occur due to multiple hits is very odd.

May I ask where these sequences come from? (unless they are simulated).

Best, Sergei

seqs.msa.gz https://github.com/veg/hyphy/files/13481434/seqs.msa.gz

— Reply to this email directly, view it on GitHub https://github.com/veg/hyphy/issues/1668#issuecomment-1828778928, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDABE46A2CB2P5PARI5YAMTYGUMYBAVCNFSM6AAAAAA73OHYSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRYG43TQOJSHA . You are receiving this because you were mentioned.Message ID: @.***>

Homosapiens MKKEVCSVAFLKAVFAEFLATLIFVFFGLGSALKWPSALPTILQIALAFGLAIGTLAQAL GPVSGGHINPAITLALLVGNQISLLRAFFYVAAQLVGAIAGAGILYGVAPLNARGNLAVN ALNNNTTQGQAMVVELILTFQLALCIFASTDSRRTSPVGSPALSIGLSVTLGHLVGIYFT GCSMNPARSFGPAVVMNRFSPAHWVFWVGPIVGAVLAAILYFYLLFPNSLSLSERVAIIK GTYEPDEDWEEQREERKKTMELTTR Pantroglodytes MKKEVCSVAFLKAVFAEFLATLIFVFFGLGSALKWPSALPTILQIALAFGLAIGTLAQALGPVSGGHINP AITLALLVGNQISLLRAFFYVAAQLVGAIAGASILYGVAPLNARGNLAVNALNNNTTQGQAMVVELILTF QLALCIFASTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMNRFSPAHWVFWVGP IVGAVLAAILYFYLLFPNSLSLSERVAIIKGTYEPDEDWEEQREERKKTMELTTR Macacanemestrina MRGVRDTGPVYTAAWSRGPDGSALSAGGAGPRGCAGRRGRARAAGTPPNPCAAAALSVWSRPPPPRRRPA PAPAPIESRPSRARRPGPARSGCWDRARRHPARPRPAASTSSAACDPTGAPRRGRRRRAPAATMKKEVCS VAFLKAVFAEFLATLIFVFFGLGSALKWPSALPTILQIALAFGLAIGTLAQALGPVSGGHINPAITLALL VGNQISLLRAFFYVAAQLVGAIAGAGILYGVAPLNARGNLAVNALNNNTTQGQAMVVELILTFQLALCIF ASTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMNRFSPAHWVFWVGPIVGAVLA AILYFYLLFPNSLSLSERVDIIKGTYEPDEDWEEQREERKKTMELTAR Pongoabelii MKKEVCSVAFLKAVFAEFLATLIFVFFGLGSALKWPSALPTILQIALAFGLAIGTLAQALGPVSGGHINP AITLALLVGNQISLLRAFFYVAAQLVGAIAGAGILYGVAPLNARGNLAVNALNNNTTQGQAMVVELILTF QLALCIFASTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMNRFSPAHWVFWVGP IVGAVLAAILYFYLLFPNSLSLSERVAIIKGTYEPDEDWEEQREERKKTMELTTH Papioanubis MKKEVCSVAFLKAVFAEFLATLIFVFFGLGSALKWPSALPTILQIALAFGLAIGTLAQALGPVSGGHINP AITLALLVGNQISLLRAFFYVAAQLVGAIAGAGILYGVAPLNARGNLAINALNNNTTQGQAMVVELILTF QLALCIFASTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMNRFSPAHWVFWVGP IVGAVLAAILYFYLLFPNSLSLSERVDIIKGTYEPDEDWEEQREERKKTMELTAR Mandrillusleucophaeus MEGPQTQAWETESAAQFSRPRLTPPSRQVDKGNPAWERAPPGVHCLVQVCSVAFLKAVFAEFLATLIFVF FGLGSALKWPSALPTILQIALAFGLAIGTLAQALGPVSGGHINPAITLALLVGNQISLLRAFFYVAAQLV GAIAGAGILYGVAPLNARGNLAVNALNNNTTQGQAMVVELILTFQLALCIFASTDSRRTSPVGSPALSIG LSVTLGHLVGIYFTGCSMNPARSFGPAVVMNRFSPAHWVFWVGPIVGAVLAAILYFYLLFPNSLSLSERV DIIKGTYEPDEDWEEQREERKKTMELTAR Galeopterusvariegatus MKKEVCSVAFLKAVFAEFLATLIFVFFGLGSALKWPSALPTILQISLAFGLAIGTLAQALGPVSGGHINP AITLALLVGNQISLLRAVFYVVAQLVGAIAGAGILYGLAPLNARGNLAVNALNNNTTQGQAMVVELILTF QLALCIFSSTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMKRFSPAHWVFWVGP IVGAVLAAILYFYLLFPNSLSLSERVAVFKGTYEPEEDWEEQREERKKTMELTAR Mustelaputoriusfuro MKKEVCSVAFLKAVFAEFLATLIFVFFGLGSALKWPSALPSILQISLAFGLAIGTLAQALGPVSGGHINP AITLALLVGNQISLLRAVFYVAAQLVGAIAGAGILYGLAPLNARGNLAINALNNNTTQGQAMVVELILTF QLALCIFSSTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMNRFSSAHWVFWVGP IVGAILAAILYFYLLFPNSLSVSERVAVIKGTYEPEEDWEEQREERKKTMELTAR Carlito syrichta MKKEVCSVAFVKAVFAEFLATLVFVFFGLGSALRWPSALPTILQIALAFGLAIGTLAQALGPVSGGHINP AITLALLVGNQISLLRALFYVVAQLVGAIAGAGILYGLAPLNARGNLAVNALNNNTTPGQAMAVELILTF QLALCVFASTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMNRFSPAHWVFWVGP IVGAVLAAILYFYLLFPHSLSLSERVAIIKGTYEPDEDWEEQREERKKTMELTAR Ailuropodamelanoleuca MKKEVCSVAFLKAVFAEFLATLIFVFFGLGSALKWPSALPSILQISLAFGLAIGTLAQALGPVSGGHINP AITLALLVGNQISLLRAAFYVVAQLVGAIAGAGILYGLAPLNARGNLAINALNNNTTQGQAMVVELILTF QLALCIFSSTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMNRFSSAHWVFWVGP IVGAILAAVLYFYLLFPNSLSLSERVAVIKGTYEPEEDWEEQREERKKTMELTAR Saimiriboliviensisboliviensis MKKEVCSVAFLKAVFAEFLATLIFVFFGLGSALKWPSALPTILQISLAFGLAIGTLVQALGPVSGGHINP AVTLALLVGNQISLLRALFYVVAQLVGAIAGAGILYGLAPLNARGNLAVNALNNNTTPGQATAVELILTF QLALCIFASTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMNRFSPVHWVFWVGP IVGAVLAAILYFYLLFPNSLSLSERVAIFKGTYEPDEDWEEQREERKKTMELTAR Cebusimitator MKKEVCSVAFLKAVFAEFLATLIFVFFGLGSALKWPSALPTILQISLAFGLAIGTLVQALGPVSGGHINP AVTLALLVGNQISLLRALFYVVAQLVGAIAGAGILYGLAPLNARGNLAVNAVNKNTTPGQAMAVELILTF QLALCIFASTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMNRFSRAHWVFWVGP IVGAVLAAILYFYLLFPNSLSLGERVAIFKGTYEPDEDWEEQREERKKTMELTAR Aotusnancymaae MKKEVCSVAFLKAVFAEFLATLIFIFFGLGSALKWPSALPTILQISLAFGLAIGTLVQALGPVSGGHINP AVTLALLVGNQISLLRALFYVVAQLVGAIAGAGILYGLAPLNARGNLAVNGINSNTTPGQAMAVELILTF QLALCIFASTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMNRFSSAHWVFWVGP IVGAVLAAILYFYLLFPNSLSLGERVAIFKGTYEPDEDWEEQREERKKTMELTAR Rattusnorvegicus MKKEVCSLAFFKAVFAEFLATLIFVFFGLGSALKWPSALPTILQISIAFGLAIGTLAQALGPVSGGHINP AITLALLIGNQISLLRAVFYVAAQLVGAIAGAGILYWLAPLNARGNLAVNALNNNTTPGKAMVVELILTF QLALCIFSSTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMNRFSPSHWVFWVGP IVGAMLAAILYFYLLFPSSLSLHDRVAVVKGTYEPEEDWEDHREERKKTIELTAH Musmusculus MKKEVCSVAFFKAVFAEFLATLIFVFFGLGSALKWPSALPTILQISIAFGLAIGILAQALGPVSGGHINP AITLALLIGNQISLLRAIFYVAAQLVGAIAGAGILYWLAPGNARGNLAVNALSNNTTPGKAVVVELILTF QLALCIFSSTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMNRFSPSHWVFWVGP IVGAVLAAILYFYLLFPSSLSLHDRVAVVKGTYEPEEDWEDHREERKKTIELTAH Oryctolaguscuniculus MKKEVCSVAFLKAVFAEFLATLIFVFFGLGSALKWPSALPSILQIALAFGLAIGTLAQALGPVSGGHINP AITLALLVGNQISLLRAVFYVAAQLVGAIAGAGILYGLAPLNARGNLAVNALNNNTTPGQAVVVELILTF QLALCIFSSTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMKRFSPSHWVFWVGP IVGAILAAILYFYLLFPTSLSLSERVAVVKGSYEPEEDWEEHREKTLELTSR Myotislucifugus MKKEVCSVAFVKAVFTEFLATLIFVFFGLGSALQWPSALPSILQISLAFGLAIGTLAQALGPVSGGHINP AITLALLVGNQISLLRAVFYVVAQLVGAIAGAGILYGLAPLNARGSLAVNALNNNTTPGQAMVVELILTF QLALCIFSSTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMKRFSSAHWVFWVGP IVGAALAAILYFYLLFPNSLSLSERVAVVKGTYEPEEDWEEQREERKKTMELTAH Susscrofa MKKEVCSLAFLKAVFAEFLATLIFVFFGLASALKWPSALPTILQIALAFGLAIGTLAQALGPVSGGHINP AITLALLVGNQISLLRAVFYVVAQLVGAIAGAGILYGLAPGNARGNLAVNSLNNNTTPGQAVVVEMILTF QLALCIFSSTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMNRFSPSHWVFWVGP IVGAAVAAILYFYLLFPNSLSLSERVAVVKGTYESEEDWEEQREERKKTMELTAH Heterocephalusglaber MKKEMCSVAFLKAVFAEFLATLIFVFFGLGSALKWPSALPSILQISMAFGLAIGTLAQALGPVSGGHINP AVTLALLVGNQISLLRAVFYVAAQLVGAIAGAGILYGVAPTNARGNLAVNALNNNTTPGQAVVVELILTF QLALCIFSSTDSRRTSPVGSPALSIGFSVALGHLVGIYFTGCSMNPARSFGPAVVMKRFSSSHWVFWVGP IVGAMLAAILYFYLLFPHSLSLSERMAIIKGTYEPEDDWEDQREERKKTIELTAH Caviaporcellus MKKEVCSVAFLKAVFAEFLATLIFVFFGLGSALKWPSALPTILQISLAFGLAIGTLAQALGPVSGGHINP AITLALLVGNQISLLRAVFYVIAQLVGAIAGAGILYGVAPTNARGNLAVNALNSNITTGQAVVVELILTF QLALCIFSSTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMKRFSSTHWVFWVGP IVGAVLAAILYFYVLFPHSLSISDRVAIVKGTYEPEEDWEEQHEERKKTIELTAR Manisjavanica MKKEVCSVAFLKAVFAEFLATLIFVFLGLGSALKWPSALPSVLQISLAFGLAIGTLAQALGPVSGGHINP AITLALLVGNQISLLRAVFYVVAQLVGAIAGAGILYGLAPVNVRGNLAVNSLNNNTTPGQAMAVELILTF QLALCIFSSTDSRRTSPMGSPALSIGLSVTLGHFVGIYFTGCSMNPARSFGPAVVMKWFSPAHWVFWVGP IVGAALAAILYFYLLFPNSLSLSERVAVIKGTYEPEEDWEEQREERKKTMELTAH Chinchillalanigera MLRPAAAQPVYTAGWVTWPGQGGRRAGVGVGARPGARGARAAAAGSAPCAPCGPPSAGAAHCPPARAPRP GARPVYSAQCQLAGRPARAEPGARPAPQPACASAPTAAARRRRAPEATMKKEVCSVAFLKAVFAEFLATL IFVFFGLGSALKWPSALPTILQISLAFGLAIGTLVQALGPVSGGHINPAITLALLVGNQISLLRAVFYVI AQLVGAIAGAGILYGVAPTNARGNLAVNALNNNTTAGQAVVVELILTFQLALCIFASTDTRRSSPVGAPA LSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMKRFSSSHWVFWVGPIVGAVLASILYFYLLFPHSLSL SERVAIVKGTYEPEDDWEEQREERKKTIELTAH Bostaurus MKKEVCSVAFLKAVFAEFLATLIFVFFGLGSALKWPSALPSVLQISLAFGLAIGTMAQALGPVSGGHMNP AITLALLVGNQISLLRAVFYVVAQLVGAIAGAAILYGLAPYNARGNLAVNALNNNTTAGQAVVAEMILTF QLALCVFSSTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPSVIMNRFSSAHWVFWVGP IVGAAVAAIIYFYLLFPHSLSLSDRAAILKGTYEPDEDWEESQEERKKTMELTAH Feliscatus MKKEVCSVAFLKAVFAEFLATLIFVFFGLGSALKWPSALPSILQISLAFGLAIGTLAQALGPVSGGHINP AITLALLVGNQISLLRAVFYVVAQLVGAIAGAGILYGLAPINARGNLAINALNNNTTQGQAMVVELILTF QLALCVFSSTDSRRTSPVGSPALSIGLSVTLGHLVGIYFTGCSMNPARSFGPAVVMKRFSPAHWVFWVGP IVGAILAAILYFYLLFPNSLSLSERVAVVKGTYEPEEDWEEQREERKKTMELTAR Equuscaballus MKKEVCSVAFFKAVFAEFLATLIFVFFGLGSALQWPSALPSILQISMAFGLAIGTLAQALGPVSGGHINP AITLALFVGNQISLLRALFYVVAQLVGAIAGAAILYGLAPRNARGNLAINSLNSNTTPGQAMVVELILTF QLALCIFSSTDSRRTSPVGSPALSIGLSVTLGHLLGIHFTGCSMNPARSFGPAVIMKRFSSAHWVFWVGP IVGAALAAILYFYLLFPNSLSLSERVAIVKGTYEPEEDWEEQREERKKTMELTAH gorilla MKKEVCSVAFLKAVFAEFLATLIFVFFGLGSALKWPSALPTILQIALAFGLAIGTLAQAL GPVSGGHINPAITLALLVGNQISLLRAFFYVAAQLVGAIAGAGILYGVAPLNARGNLAVN ALNNNTTQGQAMVVELILTFQLALCIFASTDSRRTSPVGSPALSIGLSVTLGHLVGIYFT GCSMNPARSFGPAVVMNRFSPAHWVFWVGPIVGAVLAAILYFYLLFPNSLSLSERVAIIK GTYEPDEDWEEQREERKKTMELTTR pteropusvampyrus MKKEVCSVAFIKAVFTEFLATLIFVFFGLGSALQWPSALPSILQISLAFGLAIGTLAQAL GPVSGGHINPAITLALLVGNQISLLRATFYVVAQLLGAIAGAGILYGLAPTNARGNLAVN ALNNNTTPGQAVVVELILTFQLALCVFSSTDSRRTSPVGSPALSIGLSVTLGHLVGIYFT GCSMNPARSFGPAVVMKRFSPAHWVFWVGPIVGAALAAILYFYLLFPNSLSLSERVAVVK GTYEPEEDWEEQREERKKTMELTAR loxodontafricana MWELRSIAFSRAVFSEFLATLLFVFFGLGSALNWPQALPSVLQIAMAFGLAIGTLVQTLG HISGAHINPAVTVACLVGCHVSFLRATFYLAAQLLGAVAGAALLHELTPPDIRGDLAVNA LSNNTTVGQAVTVELFLTLQLVLCIFASTDDRRGDNLGTPALSIGFSVALGHLLGIHYTG CSMNPARSLAPAIITGKFDDHWVFWIGPLVGGILGSLLYNYVLFPHSKSLSERLAVLKGL EPDTDWEEREVRRRQSVELHSPQSLQRAARP

human atgaaaaaagaagtgtgcagcgtggcgtttctgaaagcggtgtttgcggaatttctggcg accctgatttttgtgttttttggcctgggcagcgcgctgaaatggccgagcgcgctgccg accattctgcagattgcgctggcgtttggcctggcgattggcaccctggcgcaggcgctg ggcccggtgagcggcggccatattaacccggcgattaccctggcgctgctggtgggcaac cagattagcctgctgcgcgcgtttttttatgtggcggcgcagctggtgggcgcgattgcg ggcgcgggcattctgtatggcgtggcgccgctgaacgcgcgcggcaacctggcggtgaac gcgctgaacaacaacaccacccagggccaggcgatggtggtggaactgattctgaccttt cagctggcgctgtgcatttttgcgagcaccgatagccgccgcaccagcccggtgggcagc ccggcgctgagcattggcctgagcgtgaccctgggccatctggtgggcatttattttacc ggctgcagcatgaacccggcgcgcagctttggcccggcggtggtgatgaaccgctttagc ccggcgcattgggtgttttgggtgggcccgattgtgggcgcggtgctggcggcgattctg tatttttatctgctgtttccgaacagcctgagcctgagcgaacgcgtggcgattattaaa ggcacctatgaaccggatgaagattgggaagaacagcgcgaagaacgcaaaaaaaccatg gaactgaccacccgc chimp atgaaaaaagaagtgtgcagcgtggcgtttctgaaagcggtgtttgcggaatttctggcg accctgatttttgtgttttttggcctgggcagcgcgctgaaatggccgagcgcgctgccg accattctgcagattgcgctggcgtttggcctggcgattggcaccctggcgcaggcgctg ggcccggtgagcggcggccatattaacccggcgattaccctggcgctgctggtgggcaac cagattagcctgctgcgcgcgtttttttatgtggcggcgcagctggtgggcgcgattgcg ggcgcgagcattctgtatggcgtggcgccgctgaacgcgcgcggcaacctggcggtgaac gcgctgaacaacaacaccacccagggccaggcgatggtggtggaactgattctgaccttt cagctggcgctgtgcatttttgcgagcaccgatagccgccgcaccagcccggtgggcagc ccggcgctgagcattggcctgagcgtgaccctgggccatctggtgggcatttattttacc ggctgcagcatgaacccggcgcgcagctttggcccggcggtggtgatgaaccgctttagc ccggcgcattgggtgttt macaque atgcgcggcgtgcgcgataccggcccggtgtataccgcggcgtggagccgcggcccggat ggcagcgcgctgagcgcgggcggcgcgggcccgcgcggctgcgcgggccgccgcggccgc gcgcgcgcggcgggcaccccgccgaacccgtgcgcggcggcggcgctgagcgtgtggagc cgcccgccgccgccgcgccgccgcccggcgccggcgccggcgccgattgaaagccgcccg agccgcgcgcgccgcccgggcccggcgcgcagcggctgctgggatcgcgcgcgccgccat ccggcgcgcccgcgcccggcggcgagcaccagcagcgcggcgtgcgatccgaccggcgcg ccgcgccgcggccgccgccgccgcgcgccggcggcgaccatgaaaaaagaagtgtgcagc gtggcgtttctgaaagcggtgtttgcggaatttctggcgaccctgatttttgtgtttttt ggcctgggcagcgcgctgaaatggccgagcgcgctgccgaccattctgcagattgcgctg gcgtttggcctggcgattggcaccctggcgcaggcgctgggcccggtgagcggcggccat attaacccggcgattaccctggcgctgctggtgggcaaccagattagcctgctgcgcgcg tttttttatgtggcggcgcagctggtgggcgcgattgcgggcgcgggcattctgtatggc gtggcgccgctgaacgcgcgcggcaacctggcggtgaacgcgctgaacaacaacaccacc cagggccaggcgatggtggtggaactgattctgacctttcagctggcgctgtgcattttt gcgagcaccgatagccgccgcaccagcccggtgggcagcccggcgctgagcattggcctg agcgtgaccctgggccatctggtgggcatttattttaccggctgcagcatgaacccggcg cgcagctttggcccggcggtggtgatgaaccgctttagcccggcgcattgggtgttttgg gtgggcccgattgtgggcgcggtgctggcggcgattctgtatttttatctgctgtttccg aacagcctgagcctgagcgaacgcgtggatattattaaaggcacctatgaaccggatgaa gattgggaagaacagcgcgaagaacgcaaaaaaaccatggaactgaccgcgcgc Pongoabelii atgaaaaaagaagtgtgcagcgtggcgtttctgaaagcggtgtttgcggaatttctggcg accctgatttttgtgttttttggcctgggcagcgcgctgaaatggccgagcgcgctgccg accattctgcagattgcgctggcgtttggcctggcgattggcaccctggcgcaggcgctg ggcccggtgagcggcggccatattaacccggcgattaccctggcgctgctggtgggcaac cagattagcctgctgcgcgcgtttttttatgtggcggcgcagctggtgggcgcgattgcg ggcgcgggcattctgtatggcgtggcgccgctgaacgcgcgcggcaacctggcggtgaac gcgctgaacaacaacaccacccagggccaggcgatggtggtggaactgattctgaccttt cagctggcgctgtgcatttttgcgagcaccgatagccgccgcaccagcccggtgggcagc ccggcgctgagcattggcctgagcgtgaccctgggccatctggtgggcatttattttacc ggctgcagcatgaacccggcgcgcagctttggcccggcggtggtgatgaaccgctttagc ccggcgcattgggtgttttgggtgggcccgattgtgggcgcggtgctggcggcgattctg tatttttatctgctgtttccgaacagcctgagcctgagcgaacgcgtggcgattattaaa ggcacctatgaaccggatgaagattgggaagaacagcgcgaagaacgcaaaaaaaccatg gaactgaccacccat Papioanubis gcgattaccctggcgctgctggtgggcaaccagattagcctgctgcgcgcgtttttttat gtggcggcgcagctggtgggcgcgattgcgggcgcgggcattctgtatggcgtggcgccg ctgaacgcgcgcggcaacctggcgattaacgcgctgaacaacaacaccacccagggccag gcgatggtggtggaactgattctgacctttcagctggcgctgtgcatttttgcgagcacc gatagccgccgcaccagcccggtgggcagcccggcgctgagcattggcctgagcgtgacc ctgggccatctggtgggcatttattttaccggctgcagcatgaacccggcgcgcagcttt ggcccggcggtggtgatgaaccgctttagcccggcgcattgggtgttttgggtgggcccg attgtgggcgcggtgctggcggcgattctgtatttttatctgctgtttccgaacagcctg agcctgagcgaacgcgtggatattattaaaggcacctatgaaccggatgaagattgggaa gaacagcgcgaagaacgcaaaaaaaccatggaactgaccgcgcgc Mandrillusleucophaeus atggaaggcccgcagacccaggcgtgggaaaccgaaagcgcggcgcagtttagccgcccg cgcctgaccccgccgagccgccaggtggataaaggcaacccggcgtgggaacgcgcgccg ccgggcgtgcattgcctggtgcaggtgtgcagcgtggcgtttctgaaagcggtgtttgcg gaatttctggcgaccctgatttttgtgttttttggcctgggcagcgcgctgaaatggccg agcgcgctgccgaccattctgcagattgcgctggcgtttggcctggcgattggcaccctg gcgcaggcgctgggcccggtgagcggcggccatattaacccggcgattaccctggcgctg ctggtgggcaaccagattagcctgctgcgcgcgtttttttatgtggcggcgcagctggtg ggcgcgattgcgggcgcgggcattctgtatggcgtggcgccgctgaacgcgcgcggcaac ctggcggtgaacgcgctgaacaacaacaccacccagggccaggcgatggtggtggaactg attctgacctttcagctggcgctgtgcatttttgcgagcaccgatagccgccgcaccagc ccggtgggcagcccggcgctgagcattggcctgagcgtgaccctgggccatctggtgggc atttattttaccggctgcagcatgaacccggcgcgcagctttggcccggcggtggtgatg aaccgctttagcccggcgcattgggtgttttgggtgggcccgattgtgggcgcggtgctg gcggcgattctgtatttttatctgctgtttccgaacagcctgagcctgagcgaacgcgtg gatattattaaaggcacctatgaaccggatgaagattgggaagaacagcgcgaagaacgc aaaaaaaccatggaactgaccgcgcgc Galeopterusvariegatus atgaaaaaagaagtgtgcagcgtggcgtttctgaaagcggtgtttgcggaatttctggcg accctgatttttgtgttttttggcctgggcagcgcgctgaaatggccgagcgcgctgccg accattctgcagattagcctggcgtttggcctggcgattggcaccctggcgcaggcgctg ggcccggtgagcggcggccatattaacccggcgattaccctggcgctgctggtgggcaac cagattagcctgctgcgcgcggtgttttatgtggtggcgcagctggtgggcgcgattgcg ggcgcgggcattctgtatggcctggcgccgctgaacgcgcgcggcaacctggcggtgaac gcgctgaacaacaacaccacccagggccaggcgatggtggtggaactgattctgaccttt cagctggcgctgtgcatttttagcagcaccgatagccgccgcaccagcccggtgggcagc ccggcgctgagcattggcctgagcgtgaccctgggccatctggtgggcatttattttacc ggctgcagcatgaacccggcgcgcagctttggcccggcggtggtgatgaaacgctttagc ccggcgcattgggtgttttgggtgggcccgattgtgggcgcggtgctggcggcgattctg tatttttatctgctgtttccgaacagcctgagcctgagcgaacgcgtggcggtgtttaaa ggcacctatgaaccggaagaagattgggaagaacagcgcgaagaacgcaaaaaaaccatg gaactgaccgcgcgc Mustelaputoriusfuro gcgattaccctggcgctgctggtgggcaaccagattagcctgctgcgcgcggtgttttat gtggcggcgcagctggtgggcgcgattgcgggcgcgggcattctgtatggcctggcgccg ctgaacgcgcgcggcaacctggcgattaacgcgctgaacaacaacaccacccagggccag gcgatggtggtggaactgattctgacctttcagctggcgctgtgcatttttagcagcacc gatagccgccgcaccagcccggtgggcagcccggcgctgagcattggcctgagcgtgacc ctgggccatctggtgggcatttattttaccggctgcagcatgaacccggcgcgcagcttt ggcccggcggtggtgatgaaccgctttagcagcgcgcattgggtgttttgggtgggcccg attgtgggcgcgattctggcggcgattctgtatttttatctgctgtttccgaacagcctg agcgtgagcgaacgcgtggcggtgattaaaggcacctatgaaccggaagaagattgggaa gaacagcgcgaagaacgcaaaaaaaccatggaactgaccgcgcgc Carlito syrichta atgaaaaaagaagtgtgcagcgtggcgtttgtgaaagcggtgtttgcggaatttctggcg accctggtgtttgtgttttttggcctgggcagcgcgctgcgctggccgagcgcgctgccg accattctgcagattgcgctggcgtttggcctggcgattggcaccctggcgcaggcgctg ggcccggtgagcggcggccatattaacccggcgattaccctggcgctgctggtgggcaac cagattagcctgctgcgcgcgctgttttatgtggtggcgcagctggtgggcgcgattgcg ggcgcgggcattctgtatggcctggcgccgctgaacgcgcgcggcaacctggcggtgaac gcgctgaacaacaacaccaccccgggccaggcgatggcggtggaactgattctgaccttt cagctggcgctgtgcgtgtttgcgagcaccgatagccgccgcaccagcccggtgggcagc ccggcgctgagcattggcctgagcgtgaccctgggccatctggtgggcatttattttacc ggctgcagcatgaacccggcgcgcagctttggcccggcggtggtgatgaaccgctttagc ccggcgcattgggtgttttgggtgggcccgattgtgggcgcggtgctggcggcgattctg tatttttatctgctgtttccgcatagcctgagcctgagcgaacgcgtggcgattattaaa ggcacctatgaaccggatgaagattgggaagaacagcgcgaagaacgcaaaaaaaccatg gaactgaccgcgcgc Ailuropodamelanoleuca atgaaaaaagaagtgtgcagcgtggcgtttctgaaagcggtgtttgcggaatttctggcg accctgatttttgtgttttttggcctgggcagcgcgctgaaatggccgagcgcgctgccg agcattctgcagattagcctggcgtttggcctggcgattggcaccctggcgcaggcgctg ggcccggtgagcggcggccatattaacccggcgattaccctggcgctgctggtgggcaac cagattagcctgctgcgcgcggcgttttatgtggtggcgcagctggtgggcgcgattgcg ggcgcgggcattctgtatggcctggcgccgctgaacgcgcgcggcaacctggcgattaac gcgctgaacaacaacaccacccagggccaggcgatggtggtggaactgattctgaccttt cagctggcgctgtgcatttttagcagcaccgatagccgccgcaccagcccggtgggcagc ccggcgctgagcattggcctgagcgtgaccctgggccatctggtgggcatttattttacc ggctgcagcatgaacccggcgcgcagctttggcccggcggtggtgatgaaccgctttagc agcgcgcattgggtgttttgggtgggcccgattgtgggcgcgattctggcggcggtgctg tatttttatctgctgtttccgaacagcctgagcctgagcgaacgcgtggcggtgattaaa ggcacctatgaaccggaagaagattgggaagaacagcgcgaagaacgcaaaaaaaccatg gaactgaccgcgcgc Saimiriboliviensisboliviensis gcggtgaccctggcgctgctggtgggcaaccagattagcctgctgcgcgcgctgttttat gtggtggcgcagctggtgggcgcgattgcgggcgcgggcattctgtatggcctggcgccg ctgaacgcgcgcggcaacctggcggtgaacgcgctgaacaacaacaccaccccgggccag gcgaccgcggtggaactgattctgacctttcagctggcgctgtgcatttttgcgagcacc gatagccgccgcaccagcccggtgggcagcccggcgctgagcattggcctgagcgtgacc ctgggccatctggtgggcatttattttaccggctgcagcatgaacccggcgcgcagcttt ggcccggcggtggtgatgaaccgctttagcccggtgcattgggtgttttgggtgggcccg attgtgggcgcggtgctggcggcgattctgtatttttatctgctgtttccgaacagcctg agcctgagcgaacgcgtggcgatttttaaaggcacctatgaaccggatgaagattgggaa gaacagcgcgaagaacgcaaaaaaaccatggaactgaccgcgcgc Cebusimitator atgaaaaaagaagtgtgcagcgtggcgtttctgaaagcggtgtttgcggaatttctggcg accctgatttttgtgttttttggcctgggcagcgcgctgaaatggccgagcgcgctgccg accattctgcagattagcctggcgtttggcctggcgattggcaccctggtgcaggcgctg ggcccggtgagcggcggccatattaacccggcggtgaccctggcgctgctggtgggcaac cagattagcctgctgcgcgcgctgttttatgtggtggcgcagctggtgggcgcgattgcg ggcgcgggcattctgtatggcctggcgccgctgaacgcgcgcggcaacctggcggtgaac gcggtgaacaaaaacaccaccccgggccaggcgatggcggtggaactgattctgaccttt cagctggcgctgtgcatttttgcgagcaccgatagccgccgcaccagcccggtgggcagc ccggcgctgagcattggcctgagcgtgaccctgggccatctggtgggcatttattttacc ggctgcagcatgaacccggcgcgcagctttggcccggcggtggtgatgaaccgctttagc cgcgcgcattgggtgttttgggtgggcccgattgtgggcgcggtgctggcggcgattctg tatttttatctgctgtttccgaacagcctgagcctgggcgaacgcgtggcgatttttaaa ggcacctatgaaccggatgaagattgggaagaacagcgcgaagaacgcaaaaaaaccatg gaactgaccgcgcgc Aotusnancymaae gcggtgaccctggcgctgctggtgggcaaccagattagcctgctgcgcgcgctgttttat gtggtggcgcagctggtgggcgcgattgcgggcgcgggcattctgtatggcctggcgccg ctgaacgcgcgcggcaacctggcggtgaacggcattaacagcaacaccaccccgggccag gcgatggcggtggaactgattctgacctttcagctggcgctgtgcatttttgcgagcacc gatagccgccgcaccagcccggtgggcagcccggcgctgagcattggcctgagcgtgacc ctgggccatctggtgggcatttattttaccggctgcagcatgaacccggcgcgcagcttt ggcccggcggtggtgatgaaccgctttagcagcgcgcattgggtgttttgggtgggcccg attgtgggcgcggtgctggcggcgattctgtatttttatctgctgtttccgaacagcctg agcctgggcgaacgcgtggcgatttttaaaggcacctatgaaccggatgaagattgggaa gaacagcgcgaagaacgcaaaaaaaccatggaactgaccgcgcgc Rattusnorvegicus atgaaaaaagaagtgtgcagcctggcgttttttaaagcggtgtttgcggaatttctggcg accctgatttttgtgttttttggcctgggcagcgcgctgaaatggccgagcgcgctgccg accattctgcagattagcattgcgtttggcctggcgattggcaccctggcgcaggcgctg ggcccggtgagcggcggccatattaacccggcgattaccctggcgctgctgattggcaac cagattagcctgctgcgcgcggtgttttatgtggcggcgcagctggtgggcgcgattgcg ggcgcgggcattctgtattggctggcgccgctgaacgcgcgcggcaacctggcggtgaac gcgctgaacaacaacaccaccccgggcaaagcgatggtggtggaactgattctgaccttt cagctggcgctgtgcatttttagcagcaccgatagccgccgcaccagcccggtgggcagc ccggcgctgagcattggcctgagcgtgaccctgggccatctggtgggcatttattttacc ggctgcagcatgaacccggcgcgcagctttggcccggcggtggtgatgaaccgctttagc ccgagccattgggtgttttgggtgggcccgattgtgggcgcgatgctggcggcgattctg tatttttatctgctgtttccgagcagcctgagcctgcatgatcgcgtggcggtggtgaaa ggcacctatgaaccggaagaagattgggaagatcatcgcgaagaacgcaaaaaaaccatt gaactgaccgcgcat Musmusculus atgaaaaaagaagtgtgcagcgtggcgttttttaaagcggtgtttgcggaatttctggcg accctgatttttgtgttttttggcctgggcagcgcgctgaaatggccgagcgcgctgccg accattctgcagattagcattgcgtttggcctggcgattggcattctggcgcaggcgctg ggcccggtgagcggcggccatattaacccggcgattaccctggcgctgctgattggcaac cagattagcctgctgcgcgcgattttttatgtggcggcgcagctggtgggcgcgattgcg ggcgcgggcattctgtattggctggcgccgggcaacgcgcgcggcaacctggcggtgaac gcgctgagcaacaacaccaccccgggcaaagcggtggtggtggaactgattctgaccttt cagctggcgctgtgcatttttagcagcaccgatagccgccgcaccagcccggtgggcagc ccggcgctgagcattggcctgagcgtgaccctgggccatctggtgggcatttattttacc ggctgcagcatgaacccggcgcgcagctttggcccggcggtggtgatgaaccgctttagc ccgagccattgggtgttttgggtgggcccgattgtgggcgcggtgctggcggcgattctg tatttttatctgctgtttccgagcagcctgagcctgcatgatcgcgtggcggtggtgaaa ggcacctatgaaccggaagaagattgggaagatcatcgcgaagaacgcaaaaaaaccatt gaactgaccgcgcat Oryctolaguscuniculus gcgattaccctggcgctgctggtgggcaaccagattagcctgctgcgcgcggtgttttat gtggcggcgcagctggtgggcgcgattgcgggcgcgggcattctgtatggcctggcgccg ctgaacgcgcgcggcaacctggcggtgaacgcgctgaacaacaacaccaccccgggccag gcggtggtggtggaactgattctgacctttcagctggcgctgtgcatttttagcagcacc gatagccgccgcaccagcccggtgggcagcccggcgctgagcattggcctgagcgtgacc ctgggccatctggtgggcatttattttaccggctgcagcatgaacccggcgcgcagcttt ggcccggcggtggtgatgaaacgctttagcccgagccattgggtgttttgggtgggcccg attgtgggcgcgattctggcggcgattctgtatttttatctgctgtttccgaccagcctg agcctgagcgaacgcgtggcggtggtgaaaggcagctatgaaccggaagaagattgggaa gaacatcgcgaaaaaaccctggaactgaccagccgc Myotislucifugus atgaaaaaagaagtgtgcagcgtggcgtttgtgaaagcggtgtttaccgaatttctggcg accctgatttttgtgttttttggcctgggcagcgcgctgcagtggccgagcgcgctgccg agcattctgcagattagcctggcgtttggcctggcgattggcaccctggcgcaggcgctg ggcccggtgagcggcggccatattaacccggcgattaccctggcgctgctggtgggcaac cagattagcctgctgcgcgcggtgttttatgtggtggcgcagctggtgggcgcgattgcg ggcgcgggcattctgtatggcctggcgccgctgaacgcgcgcggcagcctggcggtgaac gcgctgaacaacaacaccaccccgggccaggcgatggtggtggaactgattctgaccttt cagctggcgctgtgcatttttagcagcaccgatagccgccgcaccagcccggtgggcagc ccggcgctgagcattggcctgagcgtgaccctgggccatctggtgggcatttattttacc ggctgcagcatgaacccggcgcgcagctttggcccggcggtggtgatgaaacgctttagc agcgcgcattgggtgttttgggtgggcccgattgtgggcgcggcgctggcggcgattctg tatttttatctgctgtttccgaacagcctgagcctgagcgaacgcgtggcggtggtgaaa ggcacctatgaaccggaagaagattgggaagaacagcgcgaagaacgcaaaaaaaccatg gaactgaccgcgcatattgtgggcgcgattctggcggcgattctgtatttttatctgctg tttccgaccagcctgagcctgagcgaacgcgtggcggtggtgaaaggcagctat Susscrofa gcgattaccctggcgctgctggtgggcaaccagattagcctgctgcgcgcggtgttttat gtggtggcgcagctggtgggcgcgattgcgggcgcgggcattctgtatggcctggcgccg ggcaacgcgcgcggcaacctggcggtgaacagcctgaacaacaacaccaccccgggccag gcggtggtggtggaaatgattctgacctttcagctggcgctgtgcatttttagcagcacc gatagccgccgcaccagcccggtgggcagcccggcgctgagcattggcctgagcgtgacc ctgggccatctggtgggcatttattttaccggctgcagcatgaacccggcgcgcagcttt ggcccggcggtggtgatgaaccgctttagcccgagccattgggtgttttgggtgggcccg attgtgggcgcggcggtggcggcgattctgtatttttatctgctgtttccgaacagcctg agcctgagcgaacgcgtggcggtggtgaaaggcacctatgaaagcgaagaagattgggaa gaacagcgcgaagaacgcaaaaaaaccatggaactgaccgcgcat Heterocephalusglaber atgaaaaaagaaatgtgcagcgtggcgtttctgaaagcggtgtttgcggaatttctggcg accctgatttttgtgttttttggcctgggcagcgcgctgaaatggccgagcgcgctgccg agcattctgcagattagcatggcgtttggcctggcgattggcaccctggcgcaggcgctg ggcccggtgagcggcggccatattaacccggcggtgaccctggcgctgctggtgggcaac cagattagcctgctgcgcgcggtgttttatgtggcggcgcagctggtgggcgcgattgcg ggcgcgggcattctgtatggcgtggcgccgaccaacgcgcgcggcaacctggcggtgaac gcgctgaacaacaacaccaccccgggccaggcggtggtggtggaactgattctgaccttt cagctggcgctgtgcatttttagcagcaccgatagccgccgcaccagcccggtgggcagc ccggcgctgagcattggctttagcgtggcgctgggccatctggtgggcatttattttacc ggctgcagcatgaacccggcgcgcagctttggcccggcggtggtgatgaaacgctttagc agcagccattgggtgttttgggtgggcccgattgtgggcgcgatgctggcggcgattctg tatttttatctgctgtttccgcatagcctgagcctgagcgaacgcatggcgattattaaa ggcacctatgaaccggaagatgattgggaagatcagcgcgaagaacgcaaaaaaaccatt gaactgaccgcgcat Caviaporcellus gcgattaccctggcgctgctggtgggcaaccagattagcctgctgcgcgcggtgttttat gtgattgcgcagctggtgggcgcgattgcgggcgcgggcattctgtatggcgtggcgccg accaacgcgcgcggcaacctggcggtgaacgcgctgaacagcaacattaccaccggccag gcggtggtggtggaactgattctgacctttcagctggcgctgtgcatttttagcagcacc gatagccgccgcaccagcccggtgggcagcccggcgctgagcattggcctgagcgtgacc ctgggccatctggtgggcatttattttaccggctgcagcatgaacccggcgcgcagcttt ggcccggcggtggtgatgaaacgctttagcagcacccattgggtgttttgggtgggcccg attgtgggcgcggtgctggcggcgattctgtatttttatgtgctgtttccgcatagcctg agcattagcgatcgcgtggcgattgtgaaaggcacctatgaaccggaagaagattgggaa gaacagcatgaagaacgcaaaaaaaccattgaactgaccgcgcgc Manisjavanica atgaaaaaagaagtgtgcagcgtggcgtttctgaaagcggtgtttgcggaatttctggcg accctgatttttgtgtttctgggcctgggcagcgcgctgaaatggccgagcgcgctgccg agcgtgctgcagattagcctggcgtttggcctggcgattggcaccctggcgcaggcgctg ggcccggtgagcggcggccatattaacccggcgattaccctggcgctgctggtgggcaac cagattagcctgctgcgcgcggtgttttatgtggtggcgcagctggtgggcgcgattgcg ggcgcgggcattctgtatggcctggcgccggtgaacgtgcgcggcaacctggcggtgaac agcctgaacaacaacaccaccccgggccaggcgatggcggtggaactgattctgaccttt cagctggcgctgtgcatttttagcagcaccgatagccgccgcaccagcccgatgggcagc ccggcgctgagcattggcctgagcgtgaccctgggccattttgtgggcatttattttacc ggctgcagcatgaacccggcgcgcagctttggcccggcggtggtgatgaaatggtttagc ccggcgcattgggtgttttgggtgggcccgattgtgggcgcggcgctggcggcgattctg tatttttatctgctgtttccgaacagcctgagcctgagcgaacgcgtggcggtgattaaa ggcacctatgaaccggaagaagattgggaagaacagcgcgaagaacgcaaaaaaaccatg gaactgaccgcgcat Chinchillalanigera ggcgcgcgcccggtgtatagcgcgcagtgccagctggcgggccgcccggcgcgcgcggaa ccgggcgcgcgcccggcgccgcagccggcgtgcgcgagcgcgccgaccgcggcggcgcgc cgccgccgcgcgccggaagcgaccatgaaaaaagaagtgtgcagcgtggcgtttctgaaa gcggtgtttgcggaatttctggcgaccctgatttttgtgttttttggcctgggcagcgcg ctgaaatggccgagcgcgctgccgaccattctgcagattagcctggcgtttggcctggcg attggcaccctggtgcaggcgctgggcccggtgagcggcggccatattaacccggcgatt accctggcgctgctggtgggcaaccagattagcctgctgcgcgcggtgttttatgtgatt gcgcagctggtgggcgcgattgcgggcgcgggcattctgtatggcgtggcgccgaccaac gcgcgcggcaacctggcggtgaacgcgctgaacaacaacaccaccgcgggccaggcggtg gtggtggaactgattctgacctttcagctggcgctgtgcatttttgcgagcaccgatacc cgccgcagcagcccggtgggcgcgccggcgctgagcattggcctgagcgtgaccctgggc catctggtgggcatttattttaccggctgcagcatgaacccggcgcgcagctttggcccg gcggtggtgatgaaacgctttagcagcagccattgggtgttttgggtgggcccgattgtg ggcgcggtgctggcgagcattctgtatttttatctgctgtttccgcatagcctgagcctg agcgaacgcgtggcgattgtgaaaggcacctatgaaccggaagatgattgggaagaacag cgcgaagaacgcaaaaaaaccattgaactgaccgcgcat Bostaurus gcgattaccctggcgctgctggtgggcaaccagattagcctgctgcgcgcggtgttttat gtggtggcgcagctggtgggcgcgattgcgggcgcggcgattctgtatggcctggcgccg tataacgcgcgcggcaacctggcggtgaacgcgctgaacaacaacaccaccgcgggccag gcggtggtggcggaaatgattctgacctttcagctggcgctgtgcgtgtttagcagcacc gatagccgccgcaccagcccggtgggcagcccggcgctgagcattggcctgagcgtgacc ctgggccatctggtgggcatttattttaccggctgcagcatgaacccggcgcgcagcttt ggcccgagcgtgattatgaaccgctttagcagcgcgcattgggtgttttgggtgggcccg attgtgggcgcggcggtggcggcgattatttatttttatctgctgtttccgcatagcctg agcctgagcgatcgcgcggcgattctgaaaggcacctatgaaccggatgaagattgggaa gaaagccaggaagaacgcaaaaaaaccatggaactgaccgcgcat Feliscatus gcgattaccctggcgctgctggtgggcaaccagattagcctgctgcgcgcggtgttttat gtggtggcgcagctggtgggcgcgattgcgggcgcgggcattctgtatggcctggcgccg attaacgcgcgcggcaacctggcgattaacgcgctgaacaacaacaccacccagggccag gcgatggtggtggaactgattctgacctttcagctggcgctgtgcgtgtttagcagcacc gatagccgccgcaccagcccggtgggcagcccggcgctgagcattggcctgagcgtgacc ctgggccatctggtgggcatttattttaccggctgcagcatgaacccggcgcgcagcttt ggcccggcggtggtgatgaaacgctttagcccggcgcattgggtgttttgggtgggcccg attgtgggcgcgattctggcggcgattctgtatttttatctgctgtttccgaacagcctg agcctgagcgaacgcgtggcggtggtgaaaggcacctatgaaccggaagaagattgggaa gaacagcgcgaagaacgcaaaaaaaccatggaactgaccgcgcgc Equuscaballus gcgattaccctggcgctgtttgtgggcaaccagattagcctgctgcgcgcgctgttttat gtggtggcgcagctggtgggcgcgattgcgggcgcggcgattctgtatggcctggcgccg cgcaacgcgcgcggcaacctggcgattaacagcctgaacagcaacaccaccccgggccag gcgatggtggtggaactgattctgacctttcagctggcgctgtgcatttttagcagcacc gatagccgccgcaccagcccggtgggcagcccggcgctgagcattggcctgagcgtgacc ctgggccatctgctgggcattcattttaccggctgcagcatgaacccggcgcgcagcttt ggcccggcggtgattatgaaacgctttagcagcgcgcattgggtgttttgggtgggcccg attgtgggcgcggcgctggcggcgattctgtatttttatctgctgtttccgaacagcctg agcctgagcgaacgcgtggcgattgtgaaaggcacctatgaaccggaagaagattgggaa gaacagcgcgaagaacgcaaaaaaaccatggaactgaccgcgcat

spond commented 7 months ago

Dear @fatima-akhtar113,

I am not sure what you mean by "reverse-translate". There is no 1-1 way to reverse translate a protein sequence because of redundant codons. For example, you can use any of the 6 available codons for Serine. You need to find the underlying CDS sequences for each corresponding species.

If you take your protein sequence for homo and use blastp on it, the following result

image

However, if you take the nucleotide sequence you provided and run blastn on it, you get complete nonsense.

image

This nucleotide sequence does not exist in nature.

You should instead pull out the corresponding CDS for each sequence, for example https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=CCDS&DATA=CCDS8793.1 will give you the human sequences.

Best, Sergei

fatima-akhtar113 commented 7 months ago

I will try to do it how you told me... I also used Perl to translate my protein transcript sequence of gene into DNA sequence.. I blasted the protein human coding sequence and picked orthologues of my specie of interest then converted them into DNA.. I will try to understand your way thankyou On Tue, Nov 28, 2023, 7:50 PM Sergei Pond @.***> wrote:

Dear @fatima-akhtar113 https://github.com/fatima-akhtar113,

I am not sure what you mean by "reverse-translate". There is no 1-1 way to reverse translate a protein sequence because of redundant codons. For example, you can use any of the 6 available codons for Serine. You need to find the underlying CDS sequences for each corresponding species.

If you take your protein sequence for homo and use blastp on it, the following result image.png (view on web) https://github.com/veg/hyphy/assets/1018513/2b256e72-d053-4f4c-b195-cf2e26e8613b

However, if you take the nucleotide sequence you provided and run blastn on it, you get complete nonsense. image.png (view on web) https://github.com/veg/hyphy/assets/1018513/f91cab25-f7ef-4371-822c-613fd91b3ff7

This nucleotide sequence does not exist in nature.

You should instead pull out the corresponding CDS for each sequence, for example https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=CCDS&DATA=CCDS8793.1 will give you the human sequences.

Best, Sergei

— Reply to this email directly, view it on GitHub https://github.com/veg/hyphy/issues/1668#issuecomment-1830000409, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDABE42YNDLVKZNAPKKLAWDYGX24JAVCNFSM6AAAAAA73OHYSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZQGAYDANBQHE . You are receiving this because you were mentioned.Message ID: @.***>

fatima-akhtar113 commented 7 months ago

heyy can i use my protein sequence to blast against nucleotide database and extract nucleotide sequences of my transcript in orthologue form

On Tue, Nov 28, 2023 at 7:56 PM fatima khan @.***> wrote:

I will try to do it how you told me... I also used Perl to translate my protein transcript sequence of gene into DNA sequence.. I blasted the protein human coding sequence and picked orthologues of my specie of interest then converted them into DNA.. I will try to understand your way thankyou On Tue, Nov 28, 2023, 7:50 PM Sergei Pond @.***> wrote:

Dear @fatima-akhtar113 https://github.com/fatima-akhtar113,

I am not sure what you mean by "reverse-translate". There is no 1-1 way to reverse translate a protein sequence because of redundant codons. For example, you can use any of the 6 available codons for Serine. You need to find the underlying CDS sequences for each corresponding species.

If you take your protein sequence for homo and use blastp on it, the following result image.png (view on web) https://github.com/veg/hyphy/assets/1018513/2b256e72-d053-4f4c-b195-cf2e26e8613b

However, if you take the nucleotide sequence you provided and run blastn on it, you get complete nonsense. image.png (view on web) https://github.com/veg/hyphy/assets/1018513/f91cab25-f7ef-4371-822c-613fd91b3ff7

This nucleotide sequence does not exist in nature.

You should instead pull out the corresponding CDS for each sequence, for example https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=CCDS&DATA=CCDS8793.1 will give you the human sequences.

Best, Sergei

— Reply to this email directly, view it on GitHub https://github.com/veg/hyphy/issues/1668#issuecomment-1830000409, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDABE42YNDLVKZNAPKKLAWDYGX24JAVCNFSM6AAAAAA73OHYSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZQGAYDANBQHE . You are receiving this because you were mentioned.Message ID: @.***>

spond commented 7 months ago

Dear @fatima-akhtar113,

How you collect your data is really up to you, and depends on the problem at hand. But based on what you describe, this seems sensbile. The database will have underlying CDS sequences for your proteins.

Best, Sergei

fatima-akhtar113 commented 6 months ago

hey i have a few queries what does this p value is busted suggest can you tell? p=8.692e-12 also when i run absrel p value is 0 which doesnot make sense i took cds file from ensemble i will attach

On Wed, Nov 29, 2023 at 6:39 PM Sergei Pond @.***> wrote:

Dear @fatima-akhtar113 https://github.com/fatima-akhtar113,

How you collect your data is really up to you, and depends on the problem at hand. But based on what you describe, this seems sensbile. The database will have underlying CDS sequences for your proteins.

Best, Sergei

— Reply to this email directly, view it on GitHub https://github.com/veg/hyphy/issues/1668#issuecomment-1831915102, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDABE455AU7LDIAGG257CQTYG43HRAVCNFSM6AAAAAA73OHYSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZRHEYTKMJQGI . You are receiving this because you were mentioned.Message ID: @.***>

fatima-akhtar113 commented 6 months ago

the methodology i followed for this was i collected cds file did alignment and striped all stop codon from hyphy and then run it in datamonkey

On Thu, Jan 4, 2024 at 12:31 PM fatima khan @.***> wrote:

hey i have a few queries what does this p value is busted suggest can you tell? p=8.692e-12 also when i run absrel p value is 0 which doesnot make sense i took cds file from ensemble i will attach

On Wed, Nov 29, 2023 at 6:39 PM Sergei Pond @.***> wrote:

Dear @fatima-akhtar113 https://github.com/fatima-akhtar113,

How you collect your data is really up to you, and depends on the problem at hand. But based on what you describe, this seems sensbile. The database will have underlying CDS sequences for your proteins.

Best, Sergei

— Reply to this email directly, view it on GitHub https://github.com/veg/hyphy/issues/1668#issuecomment-1831915102, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDABE455AU7LDIAGG257CQTYG43HRAVCNFSM6AAAAAA73OHYSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZRHEYTKMJQGI . You are receiving this because you were mentioned.Message ID: @.***>

spond commented 6 months ago

Dear @fatima-akhtar113,

I am afraid I don't fully understand what you are asking. If you are including attachments, you should do it via a web-browser (not e-mail), because otherwise the attachments will be stripped out.

Best, Sergei

fatima-akhtar113 commented 6 months ago

https://www.datamonkey.org/absrel/65978ad3ba6f2072cc42906e are myresults accurate ? is p value 0.00 correct

On Thu, Jan 4, 2024 at 6:31 PM Sergei Pond @.***> wrote:

Dear @fatima-akhtar113 https://github.com/fatima-akhtar113,

I am afraid I don't fully understand what you are asking. If you are including attachments, you should do it via a web-browser (not e-mail), because otherwise the attachments will be stripped out.

Best, Sergei

— Reply to this email directly, view it on GitHub https://github.com/veg/hyphy/issues/1668#issuecomment-1877100028, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDABE4YBIMGDOTYXZKM6ZLDYM2VLRAVCNFSM6AAAAAA73OHYSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZXGEYDAMBSHA . You are receiving this because you were mentioned.Message ID: @.***>

spond commented 6 months ago

Dear @fatima-akhtar113,

Yes, based on your alignment, aBSREL obtained a p-value of ~0.0 on the human branch (also see https://observablehq.com/@spond/absrel?url=https://www.datamonkey.org/absrel/65978ad3ba6f2072cc42906e/results for a newer visualization).

However, I would encourage you to check the alignment for robustness. Some of the "hotspots" for positive selection signal, e.g. codons around position 1150

image

seem to correspond to a gappy region which may have been misaligned

image

See https://www.ebi.ac.uk/Tools/services/web/toolresult.ebi?jobId=mview-I20240105-132233-0645-54922136-p1m

Best, Sergei

github-actions[bot] commented 4 months ago

Stale issue message

fatima-akhtar113 commented 1 month ago

Intron(s) used as neutral proxy with HKY85 model. Non-branch specific test. NULL=Negative selection and Neutral evolution. ALTERNATE= Class 1: Negative, Class 2: neutral evolution and Class 3: positive selection.

Null model inverse kappa: 0.3760629370048015 Alternate model f0: 0.3091412697694165 f1: 0.01606826205589621 f2: 0.6747904681746874 f3: 0.03334068966119959 zeta0: 0.01182198580767017 zeta1: 1 zeta2: 0.0253951718952625

Lk null = -11802.92742646848 Lk alt = -10934.97392787508 LRT = 1735.906997186801 LRT p-value = 0.0000000000 calculate NEB calculate BEB discretization= 10 please explain this

On Mon, Nov 27, 2023 at 6:03 PM Sergei Pond @.***> wrote:

Dear @fatima-akhtar113 https://github.com/fatima-akhtar113,

  1. I am afraid I can't help you unless you provide more information about the MEME analysis. If you ran in in Datamonkey, please include the URL for the results page.
  2. No, you cannot conclude that a gene is under selection if one or two sites are under selection. See https://academic.oup.com/mbe/article/32/5/1365/1134918. Use BUSTED to look for gene-level selection. image.png (view on web) https://github.com/veg/hyphy/assets/1018513/1cf455e8-d1a6-40ec-9c3b-be1628aa9329 .

Best, Sergei

— Reply to this email directly, view it on GitHub https://github.com/veg/hyphy/issues/1668#issuecomment-1827794723, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDABE4ZBLR4UEKXX62QE5T3YGSFSHAVCNFSM6AAAAAA73OHYSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRXG44TINZSGM . You are receiving this because you were mentioned.Message ID: @.***>