hillerlab / ForwardGenomics

Methods for finding associations between phenotypic and genomic differences between species using the Forward Genomics framework
MIT License
21 stars 9 forks source link

Sub loss tree missing #11

Open binlu1981 opened 4 years ago

binlu1981 commented 4 years ago

Dear Dr.@MichaelHiller,

I got a header only output after forwardGenomics.R done. The script reported messages as below for all elements.

Element Contig927.5721-5778 bold > Sub loss tree 5 missing

Sub loss tree 1 missing Sub loss tree 2 missing

How to fix it?

Thanks Best regards

Bin

MichaelHiller commented 4 years ago

Hi Bin,

I have no idea, but I guess this refers to analysis where all lineages that you labeled as trait loss have missing data? Please check if you have %id values for at least 2 of the trait loss lineages.

Otherwise, I would need more info and the data to debug this.

Michael

binlu1981 commented 4 years ago

Thanks @MichaelHiller . Only one species was believed to lost the focused trait and labeled as 0. Was it the point? Should I label at least two lineage as 0 in phenotype list file? How to make it work if there is not additional trait loss lineage?

MichaelHiller commented 4 years ago

Yes, a correlation only really makes sense with at least 2 datapoints. But you can try to set --minLosses to 1. This should highlight elements where the single loss lineage is very diverged. But the resulting P-values will not be useful. Pls see https://www.nature.com/articles/s41467-018-07122-z (the snake limb loss part) for how to find diverged elements in a single lineage only.

binlu1981 commented 4 years ago

Great suggestion. If need to find very diverged elements on one species, should I set allowedAncestralNodes for all lineages and nodes on tree in previous step? Or only set as its most recent ancestor?

MichaelHiller commented 4 years ago

You can still specify the common ancestor of a set of species. Actually, if you have a single trait-loss species and you woud specify the direct ancestor of this and its sister species, you would end up with a pairwise comparison, which is not very powerful.

binlu1981 commented 4 years ago

Hi @MichaelHiller ,

The R script reported an error from the first element when running GLS method. I checked my global input file and the format was identical with your sample file. Any suggestions?

Loop over the genomic elements... boldElement Chr01.100005377-100005568 boldError in terms.formula(formula, data = data) : invalid model formula in ExtractVars Calls: loopElements ... model.frame -> model.frame.default -> terms -> terms.formula Execution halted

Best regards Bin

MichaelHiller commented 4 years ago

Hi Bin,

sorry to hear. The R script was developed by Xavier Prudent, therefore it is hard to me to debug. Can you pls run the test example in the example folder and see if that works? If it does, maybe you can check what is different in your input. Pls check space vs tab as separators. Another idea is to change the element name to something simpler like element1 (avoiding dashes).

Also, pls use the verbose parameter to get more output.

Thanks Michael

binlu1981 commented 4 years ago

Hi @MichaelHiller , Yes, the dashes in name is the point. It can work now. One more thing, --minLosses to 1 likely not work for the condition with the single loss lineage, so is the R script not suitable the this case? Thanks Bin

MichaelHiller commented 4 years ago

Yes, a correlation with a single independent data point is not so meaningful. You can still use it to get some ranking of the elements, but I would use the methods cited above to extract a proper set of elements diverged in this lineage.