willpearse / phyloGenerator

Automated Phylogeny Generation for Ecologists
Other
15 stars 8 forks source link

renameSequences error #15

Closed vincentdavis closed 11 years ago

vincentdavis commented 11 years ago

Using my own fasta file I get this error. The names look like this:

Anacanthocoris_striicornis Excep one is like this. Diaphorina_citri diaci_nymph_66660000040632

Not sure which it is failing on.

ERROR

........ Other modes: 'reload', 'trim', 'replace', 'merge'. Hit enter to continue.

DNA Editing (delete): Traceback (most recent call last): File "phyloGenerator.py", line 3730, in main() File "phyloGenerator.py", line 3680, in main currentState.renameSequences() File "phyloGenerator.py", line 3267, in renameSequences if self.sequences[i][k]: IndexError: list index out of range

willpearse commented 11 years ago

Hello,

Thanks for this Vincent.

Could you please let me know what you typed into phyloGenerator, the input files you gave it, and the output it gave you beforehand? I sense I know what's going wrong, but I'd like a little more information to be sure.

We've been talking over email, so if you'd prefer you can just send me an email with those details.

Cheers,

Will

vincentdavis commented 11 years ago

The issue is from typing to genes in, See below I used the Demo birds and entered a and b for genes, using real names does not solve the issue. In this example the fasta does not have 2 genes which is obviously a problem but I am not sure it changes the result. I tried to chase it down with no luck but need to spend my time on other things at the moment. Inspecting self.sequences[i][k] which had values i=0 and k=1 these looked valid to me when I looked at self.sequences.

... Please input a 'stem' name to act as a prefix to all output (e.g., 'stemName_phylogeny.tre')

Stem name: t1

Please input a working directory for all your output (hit enter to use current working directory) Working directory (default - current WD): temp/ Please enter the gene(s) you want to use (e.g., 'COI' for cytochrome oxidase one') If you wish to use the defaults for your taxa, please enter 'plant', 'invertebrate', or 'vertebrate' instead Each gene on a separate line, and an empty line to continue

a b

DNA INPUT

If you already have DNA sequences in a FASTA file, please enter its location Otherwise, hit enter to continue

/Demos/British_Bird/sequencesShorter.fasta

File not found. Please try again! /Users/vincentdavis/Dropbox/PyPack/phyloGenerator/Demos/British_Birds/sequencesShorter.fasta DNA loaded

DNA CHECKING

Sequence summary: '0' indicates no sequence could be found '^^^' and '___' denote particularly long or short sequences

Sp. ID Input name a b
0 GQ482063.1 670
1 GU571912.1 648
2 DQ433853.1 672
3 JF499100.1 684
4 GQ482028.1 694
5 DQ433005.1 652
6 GU571260.1 751 ^^^ 7 JF499134.1 694
8 GQ481588.1 694
9 GQ481533.1 694
10 JF498757.1 628
11 GQ482332.1 670
12 GQ482037.1 694
13 GU572100.1 648
14 GQ481755.1 678
15 HQ864490.1 706
16 GQ482825.1 694
17 GQ481597.1 681
18 EU525412.1 730
19 DQ434667.1 687
20 GQ481287.1 694
21 GU571961.1 648
22 GQ481365.1 694
23 JF499177.1 694
24 FJ027965.1 694
25 GU572067.1 648
26 GU571252.1 722
27 GU571333.1 724
28 GQ481426.1 694
29 GU571947.1 648
30 JN703205.1 695
31 AY666280.1 693
32 GU571724.1 648
33 JF499078.1 684
34 GQ482664.1 694
35 GU951807.1 670
36 GU571428.1 736
37 DQ432843.1 611 38 GQ481248.1 694
39 DQ433197.1 652
40 GQ922632.1 699
41 AY666522.1 687
42 JQ174769.1 652
43 GU571963.1 648
44 GU572104.1 648
45 GQ481711.1 694
46 GQ481592.1 694
47 GQ922609.1 699
48 DQ434647.1 477
49 GU571630.1 704
50 JQ176121.1 652
51 GU571832.1 648
52 GQ482718.1 694
53 GQ481607.1 694
54 DQ434271.1 664
55 EU525418.1 650
56 GQ482632.1 694
57 GU571268.1 732
58 GQ482187.1 694
59 DQ433710.1 697
60 GU571976.1 648
61 AY666251.1 694
62 GU571536.1 722
63 GQ482090.1 694
64 GU571424.1 723
65 GU572073.1 648
66 DQ434437.1 697
67 GU571434.1 743 ^^^ 68 GU571484.1 751 ^^^ 69 GU571851.1 648
70 GU571954.1 648
71 FJ028401.1 694
72 GU571593.1 737 ^^^ 73 FJ808642.1 677
74 DQ434264.1 689
75 HQ997927.1 100 76 GQ482101.1 694
77 GQ482883.1 694
78 GU572158.1 648
79 GQ481657.1 694
80 GU571591.1 747 ^^^ 81 GU571653.1 723
82 FJ808634.1 700
83 GU571924.1 648
84 GU571816.1 648
85 HM033525.1 671
86 GQ481351.1 694
87 DQ433219.1 617
88 DQ434296.1 689
89 GQ922626.1 699
90 GQ481446.1 694
91 JN703186.1 695
92 GQ482466.1 694
93 AY527236.1 658
94 DQ432984.1 624 ___ 95 AY666303.1 694
96 GU572049.1 648
97 GQ482512.1 694
98 GQ482299.1 694
99 GU571853.1 648

You may now edit the sequences you are using. Deleting species may change species' IDs Huge variation in lengths of sequences (e.g., thousands of base pairs) crashes many alignment programs All species without sequence data will be ignored when continuing to the next step

TIPS: Check for long sequences, and TRIM them (use the '>' command). Make sure you've set the 'type' of gene you're using first Try RELOADing short sequences (use the '>' command). Consider searching for the 'max' length sequences REPLACE species for which you can't find sequence data (use the 'THOROUGH' command) If you have alignment problems, you can return to this stage

DELETE MODE SpID - irreversibly delete a species, e.g. '0' gene - irreversibly delete an entire gene (brings up gene choice prompt) output - write out downloaded sequences in FASTA format Other modes: 'reload', 'trim', 'replace', 'merge'. Hit enter to continue.

DNA Editing (delete): Traceback (most recent call last): File "/Applications/PyCharm.app/helpers/pydev/pydevd.py", line 1473, in debugger.run(setup['file'], None, None) File "/Applications/PyCharm.app/helpers/pydev/pydevd.py", line 1117, in run pydev_imports.execfile(file, globals, locals) #execute the script File "/Users/vincentdavis/Dropbox/PyPack/phyloGenerator/phyloGenerator.py", line 3774, in main() File "/Users/vincentdavis/Dropbox/PyPack/phyloGenerator/phyloGenerator.py", line 3724, in main currentState.renameSequences() File "/Users/vincentdavis/Dropbox/PyPack/phyloGenerator/phyloGenerator.py", line 3305, in renameSequences if self.sequences[i][k]: IndexError: list index out of range

willpearse commented 11 years ago

Hello,

Thanks for this, Vincent.

This looks like it is because you specified having two genes, when you only gave one gene. I've actually put an example of how to run this example in the file demo.txt tht should have shipped with the program.

I'm not actually at my computer so I can't test it, and hence I won't close the issue until tomorrow morning. I'll also put an assertion in to make sure issues like this aren't a problem.

Thanks again,

Will

willpearse commented 11 years ago

Hello,

Hopefully, the latest commit fixes this issue, and v1.1a includes that commit.

I'm closing this issue now - if anyone has any more issues with gene number, please let me know.

Thanks again for letting me know about this Vincent.

Will