neherlab / pan-genome-analysis

Processing pipeline for pan-genome visulization and exploration
http://pangenome.de
GNU General Public License v3.0
130 stars 37 forks source link

ete2 missing? #13

Open shlomobl opened 5 years ago

shlomobl commented 5 years ago

Dear all,

I have been having the following issue when installing panX. I followed the steps as in the site and used miniconda as indicated.

Traceback (most recent call last): File "./panX.py", line 6, in <module> from pangenome_computation import pangenome File "/usr/local/bin/panX/pan-genome-analysis/scripts/pangenome_computation.py", line 3, in <module> from cluster_collective_processing import clusterCollector File "/usr/local/bin/panX/pan-genome-analysis/scripts/cluster_collective_processing.py", line 1, in <module> from sf_geneCluster_align_makeTree import cluster_align_makeTree File "/usr/local/bin/panX/pan-genome-analysis/scripts/sf_geneCluster_align_makeTree.py", line 6, in <module> from ete2 import Tree ImportError: No module named ete2 (panX) shlomo@shlomo-HP-Z840-Workstation:/usr/local/bin/panX/pan-genome-analysis$ pip install ete2 Requirement already satisfied: ete2 in /home/shlomo/.conda/envs/panX/lib/python2.7/site-packages (2.3.10) (panX) shlomo@shlomo-HP-Z840-Workstation:/usr/local/bin/panX/pan-genome-analysis$

This is the first time I use Conda but if I got it right, ete2 is supposed to be there somewhere... I'd appreciate some help here :-)

shlomobl commented 5 years ago

Hi, Somehow I managed to install ete2 and I guess it was about pip version. But then when I run the test, nothing happens. Looking at the log I found out that diamond was not installed. After installing and running the test again, now I see that raxml is not installed. I thought all the dependencies would be installed with the Conda steps. Am I missing something?

shlomobl commented 5 years ago

And OK, this is as far as I could get with the test data...

(panX) shlomo@shlomo-HP-Z840-Workstation:/usr/local/bin/panX/pan-genome-analysis$ sudo ./panX.py -fn ./data/TestSet/ -sl Mycoplasma
Running panX in main folder: /usr/local/bin/panX/pan-genome-analysis/data/TestSet/
======  step01: strain list successfully loaded
======  starting step03: extract sequences from GenBank file
======  time for step03:
 0.01 minutes (0.55 seconds) 

======  starting step04: extract metadata from GenBank file
======  time for step04:
 0.00 minutes (0.29 seconds) 

======  starting step05: cluster proteins
diamond inputfile: reference.faa
diamond build index (makedb):  0.00 minutes (0.20 seconds)
command line record: /usr/local/bin/diamond makedb -p 1 --in /usr/local/bin/panX/pan-genome-analysis/data/TestSet/protein_faa/diamond_matches/reference.faa -d /usr/local/bin/panX/pan-genome-analysis/data/TestSet/protein_faa/diamond_matches/nr_reference> /usr/local/bin/panX/pan-genome-analysis/data/TestSet/protein_faa/diamond_matches/diamond_makedb_reference.log 2>&1
diamond alignment (blastp):  0.00 minutes (0.21 seconds)
diamond_max_target_seqs used: 600
command line record: /usr/local/bin/diamond blastp --sensitive -p 1 -e 0.001 --id 0 --query-cover 0 --subject-cover 0 -k 600 -d /usr/local/bin/panX/pan-genome-analysis/data/TestSet/protein_faa/diamond_matches/nr_reference -f 6 qseqid sseqid bitscore -q /usr/local/bin/panX/pan-genome-analysis/data/TestSet/protein_faa/diamond_matches/reference.faa -o /usr/local/bin/panX/pan-genome-analysis/data/TestSet/protein_faa/diamond_matches/query_matches.m8 -t ./  > /usr/local/bin/panX/pan-genome-analysis/data/TestSet/protein_faa/diamond_matches/diamond_blastp_reference.log  2>&1
rm: cannot remove '/usr/local/bin/panX/pan-genome-analysis/data/TestSet/protein_faa/diamond_matches/nr_reference.dmnd': No such file or directory
command line mcl: mcl /usr/local/bin/panX/pan-genome-analysis/data/TestSet/protein_faa/diamond_matches/filtered_hits.abc --abc -o /usr/local/bin/panX/pan-genome-analysis/data/TestSet/protein_faa/diamond_matches/allclusters.tsv -I 1.5 -te 1 > /usr/local/bin/panX/pan-genome-analysis/data/TestSet/protein_faa/diamond_matches/mcl.log 2>&1
mcl runtime:  0.00 minutes (0.01 seconds) 

Traceback (most recent call last):
  File "./panX.py", line 277, in <module>
    myPangenome.clustering_protein_sequences()
  File "/usr/local/bin/panX/pan-genome-analysis/scripts/pangenome_computation.py", line 139, in clustering_protein_sequences
    self.diamond_identity, self.diamond_query_cover, self.diamond_subject_cover, self.diamond_path, self.mcl_inflation)
  File "/usr/local/bin/panX/pan-genome-analysis/scripts/sf_cluster_protein.py", line 232, in clustering_protein
    return parse_geneCluster(cluster_fpath, cluster_dt_cpk_fpath)
  File "/usr/local/bin/panX/pan-genome-analysis/scripts/sf_cluster_protein.py", line 126, in parse_geneCluster
    with open(input_fpath, 'rb') as infile:
IOError: [Errno 2] No such file or directory: '/usr/local/bin/panX/pan-genome-analysis/data/TestSet/protein_faa/diamond_matches/allclusters.tsv'
(panX) shlomo@shlomo-HP-Z840-Workstation:/usr/local/bin/panX/pan-genome-analysis$ 

Would it help to run panX from outside the /usr/local/bin folder?

rneher commented 5 years ago

there seems to be something wrong with your conda environment/installation. please check that all dependencies are actually installed as they should be. mcl might be missing for example.

greenkidneybean commented 5 years ago

I had the same issue, submitted PR #26, hope this helps!