neherlab / pan-genome-analysis

Processing pipeline for pan-genome visulization and exploration
http://pangenome.de
GNU General Public License v3.0
134 stars 38 forks source link

Issue with step 8 #24

Open jpaganini opened 5 years ago

jpaganini commented 5 years ago

Hello guys, how r u? I'm having a problem with step08. I get the following error message:

======  starting step08: run fasttree and raxml for tree construction
 fasttree time-cost:  1.45 minutes (87.06 seconds)
RAxML tree optimization within the timelimit of 30 minutes
RAxML branch length optimization and rooting
Traceback (most recent call last):
  File "./panX.py", line 303, in <module>
    myPangenome.build_core_tree()
  File "/home/julian/pan-genome-analysis/scripts/pangenome_computation.py", line 200, in build_core_tree
    aln_to_Newick(self.path, self.folders_dict, self.raxml_max_time, self.raxml_path, self.threads)
  File "/home/julian/pan-genome-analysis/scripts/sf_core_tree_build.py", line 75, in aln_to_Newick
    shutil.copy('RAxML_result.branches', out_fname)
  File "/home/julian/miniconda2/envs/panX/lib/python2.7/shutil.py", line 119, in copy
    copyfile(src, dst)
  File "/home/julian/miniconda2/envs/panX/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: 'RAxML_result.branches'

I checked the raxml log and I found this:

Option -T does not have any effect with the sequential or parallel MPI version. It is used to specify the number of threads for the Pthreads-based parallelization

RAxML can't, parse the alignment file as phylip file it will now try to parse it as FASTA file

ERROR: Sequence AF-673 consists entirely of undetermined values which will be treated as missing data ERROR: Sequence CMC-MDR-Ab59 consists entirely of undetermined values which will be treated as missing data ERROR: Sequence HRAB-85 consists entirely of undetermined values which will be treated as missing data ERROR: Sequence KAB07 consists entirely of undetermined values which will be treated as missing data ERROR: Found 4 sequences that consist entirely of undetermined values, exiting...

So, I figured there might be a problem with the fasta files that gets generated in the previous steps. Any ideas on how to fix this?

rneher commented 5 years ago

This probably means that your core genome is empty. Are your genomes incomplete? or very diverse?

jpaganini commented 5 years ago

Hi Richard. Thx for your prompt response. The genomes are complete. In regards to diversity, they are not clonal strains. But all geomes belong to the same bacterial species.

avilella commented 5 years ago

I have faced the same issue at step 8. In my case, I am analysing phage genomes, each of which is small in size. Some of them are close to each other, but others are further apart. I had success until step 7 running it with -cb 0.3. Is there anything I can do to make it complete the next steps? I can't generate a complete set of files for pan-genome-visualization if I am stuck at step 7.

Any recommendations would be very welcome. Both panX and pan-genome-visualization are great tools that are making my analysis a lot easier and very detailed.

rneher commented 5 years ago

Sorry I dropped the ball here. I was traveling when this one came in and it fell through the cracks. So you say step 7 (greating a SNP alignment from the core genome) completed with -cg 0.3 but step 8 (core genome tree) failed? Could you give some more info on what was written to the log (the panX log and/or the RAXML/fasttree logs). Did step 7 produce a file "geneCluster/SNP_whole_matrix.aln"?