neherlab / pan-genome-analysis

Processing pipeline for pan-genome visulization and exploration
http://pangenome.de
GNU General Public License v3.0
132 stars 37 forks source link

DIAMOND fails to produce pairwise alignments in Step 5 #38

Open awalling opened 4 years ago

awalling commented 4 years ago

Hello,

I am attempting to run panX on a dataset of 82 genomes from a single family of Alphaproteobacteria. Using the divide and conquer strategy, my command reads as follows:

echo "source activate panX; /nas3/awalling/software/pan-genome-analysis/panX.py -fn /nas3/awalling/software/pan-genome-analysis/data/Erythrobacteraceae -sl Erythrobacteraceae -dmdc -dcs 41 -dmsi 90 -dmsqc 90 -dmssc 90 -cg 1.0 -mi /nas3/awalling/software/pan-genome-analysis/metadata/erythrobacter_panx_metadata.tsv -mtf /nas3/awalling/software/pan-genome-analysis/metadata/erythrobacter_meta_config.tsv -t 32 > /nas3/awalling/software/pan-genome-analysis/Erythrobacteraceae2.log 2> Erythrobacteraceae.err" | qsub -V -N panX_erythrobacteraceae -q batch -e nas3/awalling/software/pan-genome-analysis/panx.erythrobacteraceae.pbs.log -o /nas3/awalling/software/pan-genome-analysis/panx.erythrobacteraceae.pbs.log -l ncpus=64 -l mem=200gb -l walltime=96:00:00

However, I receive the following error:

Traceback (most recent call last): File "/nas3/awalling/software/pan-genome-analysis/panX.py", line 272, in <module> myPangenome.clustering_protein_divide_conquer() File "/nas3/awalling/software/pan-genome-analysis/scripts/pangenome_computation.py", line 153, in clustering_protein_divide_conquer self.diamond_subject_cover_subproblem, self.mcl_inflation, self.diamond_path, self.diamond_dc_subset_size) File "/nas3/awalling/software/pan-genome-analysis/scripts/sf_cluster_protein_divide_conquer.py", line 168, in clustering_divide_conquer integrate_clusters(clustering_path,cluster_fpath) File "/nas3/awalling/software/pan-genome-analysis/scripts/sf_cluster_protein_divide_conquer.py", line 103, in integrate_clusters with open('%s%s'%(clustering_path,'subproblem_finalRound_cluster.output'))\ IOError: [Errno 2] No such file or directory: '/nas3/awalling/software/pan-genome-analysis/data/Erythrobacteraceae/protein_faa/diamond_matches/subproblem_finalRound_cluster.output'

As far as I can tell, the hangup is that during the subproblem blastp stage, no pairwise alignments are generated. From the end of /protein_faa/diamond_matches/diamond_blastp_subproblem_1.log:

Loading query sequences... [0s] Closing the input file... [0.005s] Closing the output file... [0s] Closing the database file... [0.005s] Deallocating taxonomy... [0s] Total time = 49.321s Reported 0 pairwise alignments, 0 HSPs. 0 queries aligned.

The files subproblem_1_cluster.output, subproblem_1.m8, subproblem_2_cluster.output, subproblem_2.m8, and subproblem_finalRound.faa are all blank.

I have attempted to fix this error by relaxing the e-value threshold with the -dme flag, but even with an e-value cutoff of 10 and a relaxed -cg of 0.8 this error replicates.

Is there a way to fix this issue without running an all-against-all blast and providing that matrix separately?

Best,

Alexandra