Closed eliasprim closed 3 years ago
Dear @eliasprim,
Let me take a closer look at your results and get back to you. In the meantime
Best, Sergei
Dear @spond,
Thank you very much for your quick response. I am looking forward to your investigation results.
I just want to note that some of the sequences in the alignment of the 50 sequences have an insertion. Also, in this running I used a slightly different parameters, because this was a trial run in order to understand better the GARD parameters.
Kind regards,
Elias Primetis
Dear @spond,
I would like to ask you if there is any update for my issue.
Thank you in advance.
Kind regards,
Elias Primetis
Dear @eliasprim,
I pushed a fix for your issue to the dev
branch. It will be released with 2.5.29 and pushed to Datamonkey in the next few days. In the meantime, you can run GARD
locally (follow install instruction from here https://github.com/veg/hyphy-analyses)
Once installed, run
hyphy gard --alignment /path/to/file --mode Faster
You can visualize GARD results in https://observablehq.com/@spond/plotting-gard-breakpoint-support
Example output for one of your analyses is attached (uncompress before uploading) and breakpoint support looks like this
Best, Sergei
Dear @spond,
Thank you very much for your help.
I have already installed and run GARD locally and I did not know how to analyse the .json output, but now I know. Thanks again.
Kind regards,
Elias Primetis
Dear @eliasprim,
Make sure you check out the develop
branch to gain access to the fixes immediately.
Best, Sergei
Dear @spond,
Yes, I will check it out. Thank you.
Kind regards,
Elias
Dear @spond,
I used the online GARD for 100 sequences and I visualized the json output by using the link you have sent me. As you can see in the following picture there are 3 inferred breakpoints. Can I consider them as significant?
Kind regards,
Elias
Dear @eliasprim,
Yes, according to the Δ c-AIC values, the model with multiple different trees is preferred to both the null model (no recombination) and the "single tree multiple partition" (same topology but different rates) model. Looking at Figure 1 you can also notice that the first breakpoint is ~800 (strongest signal), followed by the second breakpoint ~375, and then followed by the third around ~1000.
If you want to perform additional validation, you can run the Shimodaira-Hasegawa type test using RaXML or IQ-Tree.
Best, Sergei
Dear @spond,
Thank you, I just wanted to check that I understand the result correctly.
Thank you very much for all your help.
Kind regards,
Elias
Hello Hyphy team,
I am a new user of GARD and I am currently analysing the LTR sequences of LTR retrotransposons. The LTRs are non-coding sequences, and so, based on other posts here for running GARD for non-coding sequences, I use the parameters Data Type: Nucleotide, Genetic Code: Universal, Site to site rate variation: General Discrete and Rate Classes: 4. The two following links are the results of two analyses for 25 and 50 LTR sequences respectively. By checking these results, I have some questions.
http://datamonkey.org/gard/60199269d83df369d29a9cff
http://datamonkey.org/gard/5fabfe828e3372615066c304
1) In every run, I observe that the best breakpoint model is typically the last one with the most breakpoints and the lowest score of Δ AICc. Is this correct?
2) For most runs I used a small number of sequences (10-25 LTRs). In all the GARD reports, including the first one above, I always observe a single peak that is always at the very start of the GARD site graph. As far as I understand, this peak corresponds to the only breakpoint that is strongly supported by the analysis. I have tried LTRs from various families and this is always the case. Is it a true result or an artefact possibly caused by the small number of sequences? The only time that I have used more LTRs (50), then a second peak appeared (second link above). If indeed a higher number of sequences is needed, then which is a reasonable number in order to get trustworthy results?
3) In the papers that GARD was used, authors state the there is a breakpoint in x nucleotide with p-value y. Do I have to analyse the json output in a specific way in to get this p-value?
Thank you very much in advance for your time.
Kind regards,
Elias Primetis