marbl / parsnp

Parsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. Input can be both draft assemblies and finished genomes, and output includes variant (SNP) calls, core genome phylogeny and multi-alignments. Parsnp leverages contextual information provided by multi-alignments surrounding SNP sites for filtration/cleaning, in addition to existing tools for recombination detection/filtration and phylogenetic reconstruction.
Other
126 stars 25 forks source link

Failing at raxml step #88

Open BioMinnie opened 3 years ago

BioMinnie commented 3 years ago

Hi, I'm currently using parsnp v1.5.3 on a slurm cluster (with raxml v8.2.12) and I keep getting an error when trying to run parsnp on ~13,000 bacterial genomes. Any idea on how to fix this?

CRITICAL - The following command failed:

$ raxmlHPC-PTHREADS -m GTRCAT -p 12345 -T 24 -s /data/projects/ABCDE/parsnp/parsnp.snps.mblocks -w /tmp/tmpwz1c2hjn -n OUTPUT Please verify input data and restart Parsnp. If the problem persists please contact the Parsnp development team.

  STDOUT:
  Warning, you specified a working directory via "-w"

Keep in mind that RAxML only accepts absolute path names, not relative ones!

RAxML can't, parse the alignment file as phylip file it will now try to parse it as FASTA file

TOO FEW SPECIES

bkille commented 3 years ago

Hi @BioMinnie! Thanks for using Parsnp and opening an issue. Could you provide the command used to run Parsnp?

BioMinnie commented 3 years ago

Hi, this is being run on a slurm queuing system, so the command is:

SBATCH --ntasks=1

SBATCH --cpus-per-task=24

SBATCH --mem-per-cpu=24G

module load parsnp/1.5.3 module load numpy/1.17.3-python-3.7.4

parsnp -g reference.gbk -d QCfilt/ -p 24 -c -o /data/projects/ABCDE/parsnp

bkille commented 3 years ago

@BioMinnie sorry for the delayed response. The command looks fine, so unfortunately no easy fix there. I'll look into this error by running parsnp on a comparably large dataset, but in the meantime I have some questions that might help us towards a solution:

Hope this helps and happy holidays!

-Bryce

sczerski commented 3 years ago

Hello, @BioMinnie , @bkille , did you ever find a solution to this problem? I am facing the same issue running the same command on Ubuntu 16.04 LTS remote Linux server. I have far fewer genomes (1 reference and 2 derived) though. I cleared my tmp folder and this did not solve the problem. I checked the parsnp.snps/mblocks file and it is empty. I am recieving the same error at the RAxML step.

My command is: ./parsnp -g <ref/genome.gbk> -d <derived/genomes/folder/*.fasta> -c

I am struggling to find a solution to the problem and I am unsure how to proceed/troubleshoot further.

Thanks for any help! -Wolfgang

bkille commented 3 years ago

@wolfgangcz

Just to clarify, does the command you're running include the angled brackets (< and >)? If so, you should remove them. In command line help messages, angled brackets are only there to indicate required arguments for a parameter.

./parsnp -g ref/genome.gbk -d derived/genomes/folder/*.fasta -c

If the issue persists, I'd be happy to try testing your input files to see if I could replicate the issue. Also, I would recommend using the conda version of parsnp if the error persists as that seems to have less room for installation issues.

sczerski commented 3 years ago

@bkille

Thanks so much for getting back to me. Sorry-- to be clear there are not any angled brackets. That was just to indicate the arguments entered. The command you produced is what I ran.

Unfortunately, I am assuming the issue lies in the specific yet generally ignored formatting of gbk files. I tried using a fasta file for the same genomes and everything seemed to work fine. Additionally, it received a warning that the genomes were almost twice as large as the reference, but I don't believe this is true, so this leads me to believe it is my gbk file.

Thank you for the recommendation! I tried using conda but was having issues, so I opted to install utunbu desktop (I was facing a problem connecting to my display and visualizing with gingr). It would be nice to not loose my annotations, so after trying to reproduce the results on this new system, if it fails and you would be willing to check out the files that would be amazing. Again I really appreciate this.

-w

Thank you for the recommendation!

bkille commented 3 years ago

Yea the .gbk parsing has caused issues in the past. A future goal of mine is to offload the responsibility of manual file parsing to Biopython. I wouldn't be surprised if that was the issue.

What operating system are you using?

Also, I'm more than happy to help, thanks for working through this with me! If you want to send me the files, I'm at brycekille_at_gmail.com

-Bryce