pblischak / HyDe

Hybridization detection using phylogenetic invariants
http://hybridization-detection.readthedocs.io
GNU General Public License v3.0
41 stars 15 forks source link

KeyError regarding outgroup #9

Closed ClaudiaPaetzold closed 5 years ago

ClaudiaPaetzold commented 5 years ago

Hi Paul, sorry to bother you again, but I am running into another error - this time a KeyError considering my outgroup. I have checked the spelling and can exclude that as source. The error is the following:

$ run_hyde_mp.py -i Myrsine_rearanged.phy -m map_new.txt -o G212_Arell -n 24 -t 16 -s 3394218 -j 20

Running run_hyde_mp.py

Reading input file........................Done.
Reading map file ........................Done.
Traceback (most recent call last):
File "/usr/local/anaconda3/envs/hyde/bin/run_hyde_mp.py", line 141, in <module>
data = hd.HydeData(infile, mapfile, outgroup, nind, ntaxa, nsites, quiet)
File "phyde/core/data.pyx", line 129, in phyde.core.data.HydeData.init
KeyError: 'G212_Arell'

Could you tell me, what I'm doing wrong? Thank you again.

pblischak commented 5 years ago

Hi Claudia,

Sorry for not answering on gitter! The line that the error is referencing involves reading in the map of individuals to taxa. My guess would be that there might be a small error in the way that this file is formatted. The file format specification can be found on this page, and I put a small sample below. The file should be formatted as having two columns (tab delimited) with individual names in the first column and the taxa that they belong to in the second column. If you want to send me your taxon mapping file I'm happy to take a look at it too.

Ind1    Taxon1
Ind2    Taxon1
Ind3    Taxon2
.
.
.
IndN-1    TaxonM
IndN        TaxonM

Also, I really should document this more clearly, but using too many threads can actually make the analysis slower because I'm pretty sure the Python thread model copies the data set to each thread. This copying can be really slow, so it is usually better to use 2 to 4 threads rather than a whole bunch.

ClaudiaPaetzold commented 5 years ago

Dear Paul,

I'm sorry I wrote here as well as on gitter. I remembered you mentioning that you did not get a notification about the first message I posted on the latter.

Thank you for your offer of help. I would like to take you up on your sugesstion to take a look at my map file, as I can see nothing in it that does not adhere to your guidelines. Unless it has to be in strict alphabetical order?

map_new.txt

Also, thank you for the advice regarding the number of cores to pick; I will keep it in mind.

Best, Claudia

pblischak commented 5 years ago

Hi Claudia -- no need to apologize! I don't mind if you post in both places at all

Looking at your map file, the issue is that your outgroup taxon is named "out", not "G212_Arell". So if you run the same command with -o out, then I think it should work:

run_hyde_mp.py -i Myrsine_rearanged.phy -m map_new.txt -o out -n 24 -t 16 -s 3394218 -j 2
ClaudiaPaetzold commented 5 years ago

Hi Paul, it's up and running. Thank you for your time and help. Claudia