gphocs-dev / G-PhoCS

G-PhoCS is a software package for inferring ancestral population sizes, population divergence times, and migration rates from individual genome sequences.
33 stars 4 forks source link

Errors with generating tree with ControlFileGenerator #62

Open FatihSarigol opened 5 years ago

FatihSarigol commented 5 years ago

Hello again, I am trying to use my newick tree with the ControlFileGenerator and I don't understand the rules here. When I type in the example tree it works, but my tree which comes from a FigTree export and also works on treeview of etoolkit without an issue gives error with your program:

((((Norway,NPresident),NPtransient),SouthAfrican),Antarctic,MarionIsland)

It says "Errors in the tree input" "Last character can't be a braket or a comma"

When I add ";" or "root" to the end, it then says "Error: Check your commas"

Can you see a reason what may be causing the error? Thanks!

gphocs-dev commented 5 years ago

Sorry for the delayed response. you're supposed to name the ancestral nodes in the tree (these are the ancestral populations in your phylogeny), and you also have to have a root. Your Newick tree had a 3-way split at the root. This version works: (((((Norway,NPresident)ANC1,NPtransient)ANC2,SouthAfrican)ANC3,Antarctic)ANC4,MarionIsland)ROOT

This version assumes that Antarctic and MarionIsland are not sister populations.

FatihSarigol commented 5 years ago

Thank you for your reply! We also ended up trying that actually ourselves and it did work!

I don't want to open a new thread for the following, and I believe I have fixed them myself, but 2 more issues with using a control file generated by ControlFileGenerator. When I used the control file, it gave this error:

.rror: value of const-rate should be CONST, FIXED, or VAR, got CONST '.ror: value of find-finetunes should be TRUE or FALSE, got 'TRUE Found 2 errors when parsing GENERAL-INFO in control file testX.

In the control file these 2 lines it doesn't like are as below:

            find-finetunes          TRUE
            locus-mut-rate          CONST

So, probably an issue regarding an empty character more or a newline character less or the order of the lines, I couldn't get to the bottom of that. I used the ControlFileGenerator in Windows 7 and I copied the control file it generated directly to my unix environment without opening it in Windows (I mean the file was not edited anywhere afterwards). So I used the GENERAL-INFO part from the example sample-control-file.ctl after editing for my case.

And the next error was this:

Error: uneven terms in sample line for pop 1. Error: expecting to see 2 samples in pop 1, but was able to read only 3. Error: uneven terms in sample line for pop 2. Error: expecting to see 2 samples in pop 2, but was able to read only 3. Error: uneven terms in sample line for pop 3. Error: expecting to see 2 samples in pop 3, but was able to read only 3. Error: uneven terms in sample line for pop 4. Error: expecting to see 2 samples in pop 4, but was able to read only 3. Error: uneven terms in sample line for pop 5. Error: expecting to see 2 samples in pop 5, but was able to read only 3. Error: uneven terms in sample line for pop 6. Segmentation fault

I compared the control file generated by ControlFileGenerator with the example control file and noticed that the generated one had either a space or a tab after the "d" for each samples line of each population. Here is an example:

            POP-START
                            name            Norway
                            samples         Norway d 
            POP-END

Here it is not so informative, but after "Norway d" there is a space that Gphocs thinks it is a third sample and gives an error. I deleted those for each sample and it worked.

Best..

gphocs-dev commented 5 years ago

All of these errors appear to stem from the fact that you're generating the control file on Windows (in the Control File Generator) and then using it on a Linux machine. When you transfer text files between Windows and Linux you have to use a special converter to switch the newline characters (e.g. dos2unix). Because the newline characters are differernt, G-PhoCS cannot successfully read the last token of every line.