millanek / Dsuite

Fast calculation of Patterson's D (ABBA-BABA) and the f4-ratio statistics across many populations/species
161 stars 25 forks source link

Dsuite showing error with tree file #40

Open BiodeB opened 3 years ago

BiodeB commented 3 years ago

Dear Experts,

I'm trying to run Dsuite for the first time and I'm facing below error, although without using -t option and input tree the program is running well .

Dsuite Dtrios GB_snpS.vcf SETS.txt -t astal_species_TRoot.nwk -o treeGB

There are 112 sets (excluding the Outgroup)
Going to calculate D and f4-ratio values for 227920 trios
Out of Range error: map::at
species[i]: Species1
It seems that this species is in the SETS.txt file but can't be found in the tree. Please check the spelling and completeness of your tree file.

I checked the tree for spelling mistake or not but I didn't found anything like that . Also I checked for completeness of the tree it resulted that the The Binary Tree is not complete. Therefore it is my humble request that it will be an immense help if someone kindly correct me and suggest how to overcome this issue.

Thanks, Debajyoti

XiaXiaTianTian commented 3 years ago

I also came to this error. I checked files like you done before but have no idea how to solve it yet. Now I wonder whether you solved it or not. And I'd appreciate any information you could give me.

RishiDeKayne commented 3 years ago

Hi, I recently had the issue too and realised that the populations/species names in my tree did not exactly match those I had specified previously in the SETs.txt - hope that helps

XiaXiaTianTian commented 3 years ago

Thanks a lot. I will try to check tree file again. Maybe something wrong with it.

SilvaFE commented 2 years ago

Hi everyone, I am also finding the same problem. Double-checked names and files...Did you manage to sort out this problem?

RishiDeKayne commented 2 years ago

In my case I was able to sort it by making sure that the tree I provided was at the population level that I specified in the SETS.txt file. For example if I grouped three individuals indiv1, indiv2, and indiv3 into Species1 then my tree file had Species1 listed not indivs1-3 (this is also true of the outgroup which is called 'Outgroup' in my tree). I did this just by pruning the tree and re-naming nodes using the ape package in R.

SilvaFE commented 2 years ago

Thanks a lot @RishiDeKayne ! It works for me! best regards

RezaFahi commented 2 years ago

In my case I was able to sort it by making sure that the tree I provided was at the population level that I specified in the SETS.txt file. For example if I grouped three individuals indiv1, indiv2, and indiv3 into Species1 then my tree file had Species1 listed not indivs1-3 (this is also true of the outgroup which is called 'Outgroup' in my tree). I did this just by pruning the tree and re-naming nodes using the ape package in R.

How did you do it @RishiDeKayne ; If possible, please help me to solve my problem. When I changed individual names to their related species name, I get this error: "ERROR: Duplicate value in the tree "Species1"

Thanks

RishiDeKayne commented 2 years ago

It sounds like you might have multiple individuals that are now called "Species 1" in your tree? This won't work but instead what you should do is collapse the monophyletic node in your tree to correspond to the population you specified in SETS.txt . E.g. if individuals 1, 2, and 3 are called 'Species 1' in your SETS.txt and are each others closest relatives in your individual-level phylogenetic tree and are monophyletic then use the Ape package in R to collapse this node so instead of having individuals 1,2, and 3, it now just has a single tip called "Species 1"

SilvaFE commented 2 years ago

I did exactly the same as @RishiDeKayne However, I didn't use the package Ape, but I inform the tree in a newick file and I opened this tree using iTOL to see if the topology was consistent to what we know for the study group. It was not a problem in case because I had only a few species/populations. But it can be if you have several species/populations in your tree. Best wishes Felipe

On Mon, 5 Sept 2022, 06:46 RezaFahi, @.***> wrote:

how did you that

Thanks a lot @RishiDeKayne https://github.com/RishiDeKayne ! It works for me! best regards

How did you it @SilvaFE https://github.com/SilvaFE ; If possible, please help me to solve my problem

Thanks

— Reply to this email directly, view it on GitHub https://github.com/millanek/Dsuite/issues/40#issuecomment-1236838457, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHHR7JDXESO5BSSVGKGBMSDV4XFSHANCNFSM5CYH3CBQ . You are receiving this because you were mentioned.Message ID: @.***>

RishiDeKayne commented 2 years ago

Yes and just to clarify I think if you did not group things into populations but just had an outgroup and individuals make sure every single name in the tree is exactly the same as the names in the second column of your SETS.txt file so if your tree has a topology something like (((indiv1, indiv2), indiv3), Outgroup) then make sure that your SETS.txt file looks like:

indiv1    indiv1
indiv2   indiv2
indiv3    indiv3
indiv4   Outgroup

dont forget as per the manual (https://github.com/millanek/Dsuite) that: "For Dtrios, at least one individual needs to be specified to be the outgroup by using the Outgroup keyword as shown above."

declanjhoulihan commented 2 years ago

@RishiDeKayne Could you please let me know what functions you used in ape to prune your tree? I'm trying to do the same, but I'm unfamiliar with the package.

Thanks

RishiDeKayne commented 2 years ago

Yes, I think you're looking for the following: https://www.rdocumentation.org/packages/ape/versions/5.6-2/topics/drop.tip there should be lots of tutorials for using ape in R online so hopefully once you read in your tree this will help.

thomlmarshall commented 2 years ago

I'm having an issue with this same error as well. In my case, my dataset is very simple (1 individual per taxon) and I can't see a mistake anywhere. I've tried recreating the sets.txt and tree files several times and switching from the actual species name to "Species1, Species2, etc...", and I still get this error. The contents of my sets.txt and newick file are below. Like the OP, I'm able to run the analysis just fine without the -t parameter, but I really want to get this to work with the tree file. Any advice is much appreciated.

sets.txt: PanObs Outgroup H16057 Species1 H21189 Species2 TJH3395 Species3

newick file: (((Species3,Species2),Species1),Outgroup);

Thanks!

mirandasherlock commented 1 year ago

I am having this same issue - if I collapse nodes / change labels in the SETS files as per the above suggestions then I have issues with samples not being represented from the VCF file.

RishiDeKayne commented 1 year ago

@mirandasherlock just to clarify what worked for me: in the sets.txt file the left column must be sample IDs that match those in your VCF exactly and in the right column must be population names that match the names of tips in your tree file exactly. i.e. if you are going to group multiple individuals into populations then the tree must also be presented at the population level with tip labels that exactly match the right column of the sets.txt file rather than including each individuals name. The tree file should only include tip names present in the right column. I had to double check this a few times to make sure there were no typos or individual names that had incorrectly been left in the tree file when I carried out the pruning. e.g. if your outgroup sample has 'Outgroup' in the right column of the sets.txt file (as it should) it must also be called 'Outgroup' and represented by a single tip in the tree file. Hope this helps!

mirandasherlock commented 1 year ago

Hi Rishi,

Thanks so much for replying - I forgot to update it but I got it working using the same method as you.

Thanks again,

Miranda

On 1 Aug 2023, at 17:40, Rishi De-Kayne @.***> wrote:

@mirandasherlock https://github.com/mirandasherlock just to clarify what worked for me: in the sets.txt file the left column must be sample IDs that match those in your VCF exactly and in the right column must be population names that match the names of tips in your tree file exactly. i.e. if you are going to group multiple individuals into populations then the tree must also be presented at the population level with tip labels that exactly match the right column of the sets.txt file rather than including each individuals name. The tree file should only include tip names present in the right column. I had to double check this a few times to make sure there were no typos or individual names that had incorrectly been left in the tree file when I carried out the pruning. e.g. if your outgroup sample has 'Outgroup' in the right column of the sets.txt file (as it should) it must also be called 'Outgroup' and represented by a single tip in the tree file. Hope this helps!

— Reply to this email directly, view it on GitHub https://github.com/millanek/Dsuite/issues/40#issuecomment-1660706497, or unsubscribe https://github.com/notifications/unsubscribe-auth/A33F2VUTMLFFKHH3QBMQ3E3XTEWODANCNFSM5CYH3CBQ. You are receiving this because you were mentioned.