marbl / parsnp

Parsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. Input can be both draft assemblies and finished genomes, and output includes variant (SNP) calls, core genome phylogeny and multi-alignments. Parsnp leverages contextual information provided by multi-alignments surrounding SNP sites for filtration/cleaning, in addition to existing tools for recombination detection/filtration and phylogenetic reconstruction.
Other
128 stars 25 forks source link

Remove +/-30% filter #80

Closed jellila closed 4 years ago

jellila commented 4 years ago

Hi,

With the flag -c parsnp does not include genomes that are very different in size. Does anyone know how to make parsnp recruit every genome in my folder despite the +/-30% difference in size? I tried what is suggested in question #10 but it didn't work.

Thank you.

Laura

bkille commented 4 years ago

Laura,

I have updated the —curated flag in Parsnp v1.5.2 to now include all input genomes regardless of size. It will still warn you about genomes which would have otherwise been discarded by the size filter, though.

Let me know if this version works for you (it is available on conda as well)

Best,

Bryce

jellila commented 4 years ago

Thank you very much for your help, first of all!

I get this when I run the new parsnp:

`** SETTINGS: |-refgenome: /Users/laura/Desktop/BIOINFORMATICS/P.profundum.fasta |-genomes:
/Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/MT1415.fasta /Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/KC-Na-NB1.fasta ...45 more file(s)... /Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/89dp-OG16.fasta /Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/QMA0506.fasta |-aligner: muscle |-outdir: /Users/laura/Desktop/BIOINFORMATICS/NEWPARSNP |-OS: Darwin |-threads: 1


17:15:46 - INFO - <> 17:15:46 - INFO - No genbank file provided for reference annotations, skipping.. 17:15:46 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/MT1415.fasta is 1.67x shorter than reference genome! 17:15:46 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/KC-Na-NB1.fasta is 1.42x shorter than reference genome! 17:15:46 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/A-162.fasta is 1.49x shorter than reference genome! 17:15:46 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/JM-2017.fasta is 1.45x shorter than reference genome! Traceback (most recent call last): File "/Users/laura/miniconda3/envs/htools/bin/parsnp", line 816, in hdr = ff.readline() File "/Users/laura/miniconda3/envs/htools/lib/python3.8/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 334: invalid start byte`

I guess I am doing something wrong now.

Thank you and best regards,

Laura

bkille commented 4 years ago

Laura,

Happy to help! Thanks for downloading the new version. Can you show me the full terminal? i.e. command used and all output?

My suspicion is that there is a pesky .DS_Store file causing this headache. Passing in the input genomes via -d /Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp will use all files in the directory, regardless of name or extension. This is why in Parsnp v1.5 we've added support for list/regex input, so that you can specify precisely which files to pass. In your case you can pass -d /Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/*.fasta to only use the fasta files from that directory.

Best,

Bryce

jellila commented 4 years ago

Sure, here is the code I tried the first time with the output:

`(base) laura@d-i184-58-74 ~ % conda activate htools (htools) laura@d-i184-58-74 ~ % /Users/laura/miniconda3/envs/htools/bin/parsnp -r /Users/laura/Desktop/BIOINFORMATICS/P.profundum.fasta -d /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp -o /Users/laura/Desktop/BIOINFORMATICS/Newparsnp -c |--Parsnp 1.5.1--| For detailed documentation please see --> http://harvest.readthedocs.org/en/latest 08:37:41 - INFO -


SETTINGS: |-refgenome: /Users/laura/Desktop/BIOINFORMATICS/P.profundum.fasta |-genomes:
/Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/MT1415.fasta /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/KC-Na-NB1.fasta ...45 more file(s)... /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/89dp-OG16.fasta /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/QMA0506.fasta |-aligner: muscle |-outdir: /Users/laura/Desktop/BIOINFORMATICS/Newparsnp |-OS: Darwin |-threads: 1


08:37:41 - INFO - <> 08:37:41 - INFO - No genbank file provided for reference annotations, skipping.. 08:37:41 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/MT1415.fasta is 1.67x shorter than reference genome! 08:37:41 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/KC-Na-NB1.fasta is 1.42x shorter than reference genome! 08:37:41 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/A-162.fasta is 1.49x shorter than reference genome! 08:37:41 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/JM-2017.fasta is 1.45x shorter than reference genome! Traceback (most recent call last): File "/Users/laura/miniconda3/envs/htools/bin/parsnp", line 816, in hdr = ff.readline() File "/Users/laura/miniconda3/envs/htools/lib/python3.8/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 334: invalid start byte`

After your comment about the extension of the files, I check my directory and the files are all fasta. However, I tried as you suggested:

`(htools) laura@d-i184-58-74 ~ % /Users/laura/miniconda3/envs/htools/bin/parsnp -r /Users/laura/Desktop/BIOINFORMATICS/P.profundum.fasta -d /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/*.fasta -o /Users/laura/Desktop/BIOINFORMATICS/Newparsnp2 -c |--Parsnp 1.5.1--| For detailed documentation please see --> http://harvest.readthedocs.org/en/latest 08:50:12 - INFO -


SETTINGS: |-refgenome: /Users/laura/Desktop/BIOINFORMATICS/P.profundum.fasta |-genomes:
/Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/111bp-OG15A.fasta /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/144bp-OG3.fasta ...44 more file(s)... /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/RM-71.fasta /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/SNW-8.1.fasta |-aligner: muscle |-outdir: /Users/laura/Desktop/BIOINFORMATICS/Newparsnp2 |-OS: Darwin |-threads: 1


08:50:12 - INFO - <> 08:50:12 - INFO - No genbank file provided for reference annotations, skipping.. 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/2012V-1072.fasta is 1.45x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/206328-2.fasta is 1.43x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/64bp-OG9.fasta is 1.42x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/80077637.fasta is 1.44x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/89dp-OG16.fasta is 1.41x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/9046-81.fasta is 1.44x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/91-197.fasta is 1.51x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/940804-1-1.fasta is 1.46x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/940804-1-2.fasta is 1.42x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/A-162.fasta is 1.49x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/ATCC29688.fasta is 1.67x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/ATCC29689.fasta is 1.73x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/BT-6.fasta is 1.43x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/GCSL-P85-BT-6.fasta is 1.43x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/Hep-2a-11.fasta is 1.52x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/Hep-2a-14.fasta is 1.46x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/Hep-2a-16.fasta is 1.46x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/Hep-2b-22.fasta is 1.43x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/JM-2017.fasta is 1.45x shorter than reference genome! 08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/KC-Na-1.fasta is 1.41x shorter than reference genome! 08:50:13 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/KC-Na-NB1.fasta is 1.42x shorter than reference genome! 08:50:13 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/L091106-03H.fasta is 1.48x shorter than reference genome! 08:50:13 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/LD-07.fasta is 1.48x shorter than reference genome! 08:50:13 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/MT1415.fasta is 1.67x shorter than reference genome! 08:50:13 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/OT-51443.fasta is 1.41x shorter than reference genome! 08:50:13 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/PP3.fasta is 1.53x shorter than reference genome! 08:50:13 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/RM-71.fasta is 1.42x shorter than reference genome! 08:50:13 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/SNW-8.1.fasta is 1.56x shorter than reference genome! 08:50:13 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner... 08:54:45 - CRITICAL - The following command failed:

$ /Users/laura/miniconda3/envs/htools/bin/bin/parsnp_core /Users/laura/Desktop/BIOINFORMATICS/Newparsnp2/parsnpAligner.ini Please veryify input data and restart Parsnp. If the problem persists please contact the Parsnp development team.

  STDOUT:
  0

P.profundum.fasta.ref,Len:6403280,GC:41.734 111bp-OG15A.fasta,Len:4633796,GC:40.5543 144bp-OG3.fasta,Len:4814661,GC:40.5027 164dp-OG2.fasta,Len:4626672,GC:40.5876 2012V-1072.fasta,Len:4408971,GC:40.8869 206317-1.fasta,Len:4673986,GC:40.4714 206328-2.fasta,Len:4515945,GC:40.5071 64bp-OG9.fasta,Len:4533007,GC:40.8729 70dps-OG12.fasta,Len:4827935,GC:40.4916 80077637.fasta,Len:4468636,GC:40.4223 89dp-OG16.fasta,Len:4583837,GC:40.9198 9046-81.fasta,Len:4452773,GC:40.9153 91-197.fasta,Len:4227017,GC:41.0188 940804-1-1.fasta,Len:4395972,GC:40.5314 940804-1-2.fasta,Len:4532473,GC:40.5118 A-162.fasta,Len:4335616,GC:40.7647 ATCC29688.fasta,Len:3932528,GC:40.8521 ATCC29689.fasta,Len:3796820,GC:40.9169 ATCC33539.fasta,Len:4953976,GC:40.501 BT-6.fasta,Len:4480819,GC:40.438 CDC-2227-81.fasta,Len:4719947,GC:40.459 CIP102761.fasta,Len:5048498,GC:40.6716 DI21.fasta,Len:4787052,GC:40.8234 GCSL-P85-BT-6.fasta,Len:4478193,GC:40.4111 Hep-2a-11.fasta,Len:4249732,GC:40.8167 Hep-2a-14.fasta,Len:4398187,GC:40.825 Hep-2a-16.fasta,Len:4392725,GC:40.8092 Hep-2b-22.fasta,Len:4518266,GC:40.9055 JM-2017.fasta,Len:4436399,GC:40.6322 KC-Na-1.fasta,Len:4546136,GC:40.9116 KC-Na-NB1.fasta,Len:4524406,GC:40.9226 L091106-03H.fasta,Len:4329488,GC:40.8722 LD-07.fasta,Len:4344639,GC:40.6153 MT1415.fasta,Len:3921871,GC:40.8083 NCTC11646.fasta,Len:4661852,GC:40.7215 NCTC11647.fasta,Len:5061854,GC:40.8 NCTC11648.fasta,Len:4611119,GC:40.6921 OT-51443.fasta,Len:4549111,GC:41.3788 PP3.fasta,Len:4281249,GC:40.7807 Phdp_Wu-1.fasta,Len:4586084,GC:40.807 QMA0365.fasta,Len:4755557,GC:40.6034 QMA0505.fasta,Len:4677363,GC:40.8864 QMA0506.fasta,Len:4651953,GC:40.904 QMA0509.fasta,Len:4632979,GC:40.6029 QMA0510.fasta,Len:4791745,GC:40.6392 QMA0511.fasta,Len:4664845,GC:40.5803 QMA0512.fasta,Len:4610033,GC:40.7846 RM-71.fasta,Len:4524115,GC:40.5864 SNW-8.1.fasta,Len:4240149,GC:40.827 Finished processing input sequences, elapsed time: 6 seconds

             compressed suffix graph construction elapsed time: 5 seconds

             MUM anchor search elapsed time: 208 seconds

  STDERR:

parsnpAligner:: rapid whole genome SNP typing


ParSNP: Preparing to construct global multiple alignment framework

Preparing to verify and process input sequences... Searching for initial MUM anchors...

    Constructing compressed suffix graph...
    Performing initial search for exact matches in the sequences...

Performing recursive MUM search between MUM anchors... `

In the output folder I can find these files: tmp (empty folder) psnn.ini parsnpAligner.log parsnpAligner.ini P.profundum.fasta.ref blocks (empty folder) all_mumi.ini

Thank you again!

Best,

Laura

bkille commented 4 years ago

Thanks for pasting the output! Would you mind showing me the contents of parsnpAligner.log?

jellila commented 4 years ago

It's empty, completely empty! I don't really understand 😩

bkille commented 4 years ago

Hmm, that is rather strange... the functionality of the core parsnp aligner hasn't changed for some time, so my guess is that it is failing due to the genomes all being too disparate from the reference.

You're more than welcome to attach the input files here and I can go through and debug it.

jellila commented 4 years ago

Archive.zip Thank you very much! These are only a few of the genomes (the file would be too big with all of them). I may try to change the reference genome (P. profundum), but still would like to know where the problem is!

Laura

bkille commented 4 years ago

Hmm... so with only those sequences, I am able to run w/out any issues

(base) blk6@sno:~/Projects/HarvestSuite/parsnp$ ./parsnp -r issue80/P.profundum.fasta -d issue80/genomes/*.fasta -o issue80_out -c
|--Parsnp 1.5.2--|
For detailed documentation please see --> http://harvest.readthedocs.org/en/latest
13:26:29 - INFO - 
****************************
SETTINGS:
|-refgenome:    issue80/P.profundum.fasta
|-genomes:  
    issue80/genomes/206328-2.fasta
    issue80/genomes/505.fasta
    ...2 more file(s)...
    issue80/genomes/Phdp_Wu-1.fasta
    issue80/genomes/SNW-8.1.fasta
|-aligner:  muscle
|-outdir:   issue80_out
|-OS:   Linux
|-threads:  1
****************************

13:26:29 - INFO - <<Parsnp started>>
13:26:29 - INFO - No genbank file provided for reference annotations, skipping..
13:26:29 - WARNING - File issue80/genomes/206328-2.fasta is 1.41x shorter than reference genome! 
13:26:29 - WARNING - File issue80/genomes/64bp-OG9.fasta is 1.40x shorter than reference genome! 
13:26:29 - WARNING - File issue80/genomes/91-197.fasta is 1.50x shorter than reference genome! 
13:26:29 - WARNING - File issue80/genomes/SNW-8.1.fasta is 1.54x shorter than reference genome! 
13:26:29 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner...
13:27:19 - INFO - Reconstructing core genome phylogeny...
13:27:19 - INFO - Aligned 7 genomes in 47.54 seconds
13:27:19 - INFO - Parsnp finished! All output available in issue80_out

My email is brycekille@gmail.com if you want to send me the full dataset.

Best,

Bryce

bkille commented 4 years ago

Hi Laura,

So I was able to run parsnp on my Ubuntu machine w/ your files.

parsnp -r issue80/P.profundum.fasta -d $HOME/Data/parsnpissue80/*.fasta -o issue80_out -c --threads 10
|--Parsnp 1.5.1--|
For detailed documentation please see --> http://harvest.readthedocs.org/en/latest
17:20:25 - INFO - 
****************************
SETTINGS:
|-refgenome:    issue80/P.profundum.fasta
|-genomes:  
    /home/Users/blk6/Data/parsnpissue80/111bp-OG15A.fasta
    /home/Users/blk6/Data/parsnpissue80/144bp-OG3.fasta
    ...45 more file(s)...
    /home/Users/blk6/Data/parsnpissue80/RM-71.fasta
    /home/Users/blk6/Data/parsnpissue80/SNW-8.1.fasta
|-aligner:  muscle
|-outdir:   issue80_out
|-OS:   Linux
|-threads:  10
****************************

17:20:25 - INFO - <<Parsnp started>>
17:20:25 - INFO - No genbank file provided for reference annotations, skipping..
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/2012V-1072.fasta is 1.43x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/206328-2.fasta is 1.41x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/64bp-OG9.fasta is 1.40x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/80077637.fasta is 1.42x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/9046-81.fasta is 1.42x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/91-197.fasta is 1.50x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/940804-1-1.fasta is 1.45x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/940804-1-2.fasta is 1.40x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/A-162.fasta is 1.47x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/ATCC29688.fasta is 1.65x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/ATCC29689.fasta is 1.71x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/BT-6.fasta is 1.41x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/GCSL-P85-BT-6.fasta is 1.42x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/Hep-2a-11.fasta is 1.50x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/Hep-2a-14.fasta is 1.44x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/Hep-2a-16.fasta is 1.45x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/Hep-2b-22.fasta is 1.41x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/JM-2017.fasta is 1.43x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/L091106-03H.fasta is 1.46x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/LD-07.fasta is 1.46x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/MT1415.fasta is 1.65x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/PP3.fasta is 1.51x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/RM-71.fasta is 1.41x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/SNW-8.1.fasta is 1.54x shorter than reference genome! 
17:20:26 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner...
17:24:36 - INFO - Reconstructing core genome phylogeny...
17:24:41 - INFO - Aligned 50 genomes in 4.16 minutes
17:24:41 - INFO - Parsnp finished! All output available in issue80_out

issue80_out.zip

Not sure what the issue is exactly but we can try and figure it out. How did you install parsnp and what version of MacOS are you running? I will also have a colleague try these files out on a Mac and get back to you.

Best,

Bryce

jellila commented 4 years ago

Hi Bryce,

Thank you. I attach a text file with the code I used in the terminal. I removed and re-installed parsnp and then tried to run it. Did not work. My machine is macOS Catalina 10.15.5.

Thank you!

Regards,

Laura

On 15 Jul 2020, at 11:38 am, Bryce Kille notifications@github.com<mailto:notifications@github.com> wrote:

Hi Laura,

So I was able to run parsnp on my Ubuntu machine w/ your files.

parsnp -r issue80/P.profundum.fasta -d $HOME/Data/parsnpissue80/*.fasta -o issue80_out -c --threads 10 |--Parsnp 1.5.1--| For detailed documentation please see --> http://harvest.readthedocs.org/en/latest 17:20:25 - INFO -


SETTINGS: |-refgenome: issue80/P.profundum.fasta |-genomes: /home/Users/blk6/Data/parsnpissue80/111bp-OG15A.fasta /home/Users/blk6/Data/parsnpissue80/144bp-OG3.fasta ...45 more file(s)... /home/Users/blk6/Data/parsnpissue80/RM-71.fasta /home/Users/blk6/Data/parsnpissue80/SNW-8.1.fasta |-aligner: muscle |-outdir: issue80_out |-OS: Linux |-threads: 10


17:20:25 - INFO - <> 17:20:25 - INFO - No genbank file provided for reference annotations, skipping.. 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/2012V-1072.fasta is 1.43x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/206328-2.fasta is 1.41x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/64bp-OG9.fasta is 1.40x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/80077637.fasta is 1.42x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/9046-81.fasta is 1.42x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/91-197.fasta is 1.50x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/940804-1-1.fasta is 1.45x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/940804-1-2.fasta is 1.40x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/A-162.fasta is 1.47x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/ATCC29688.fasta is 1.65x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/ATCC29689.fasta is 1.71x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/BT-6.fasta is 1.41x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/GCSL-P85-BT-6.fasta is 1.42x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/Hep-2a-11.fasta is 1.50x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/Hep-2a-14.fasta is 1.44x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/Hep-2a-16.fasta is 1.45x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/Hep-2b-22.fasta is 1.41x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/JM-2017.fasta is 1.43x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/L091106-03H.fasta is 1.46x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/LD-07.fasta is 1.46x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/MT1415.fasta is 1.65x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/PP3.fasta is 1.51x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/RM-71.fasta is 1.41x shorter than reference genome! 17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/SNW-8.1.fasta is 1.54x shorter than reference genome! 17:20:26 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner... 17:24:36 - INFO - Reconstructing core genome phylogeny... 17:24:41 - INFO - Aligned 50 genomes in 4.16 minutes 17:24:41 - INFO - Parsnp finished! All output available in issue80_out

issue80_out.ziphttps://github.com/marbl/parsnp/files/4922345/issue80_out.zip

Not sure what the issue is exactly but we can try and figure it out. How did you install parsnp and what version of MacOS are you running? I will also have a colleague try these files out on a Mac and get back to you.

Best,

Bryce

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/marbl/parsnp/issues/80#issuecomment-658495449, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOZMQWYOVXEXWO6XWJQYHH3R3UCB5ANCNFSM4OPQQG5A.

bkille commented 4 years ago

Hi Jelila,

Unfortunately I am currently unable to replicate this behavior. My next suggestion would be to try building the software from source. Let me know if this works for you!

Best,

Bryce

jellila commented 4 years ago

Hi Brice,

Thank you for your help. Did it work on the Mac as well? How can I build the software from source? Sorry, I am a biologist trying to do bioinformatics (apparently not very successfully!)

Thank you again for your help.

Best,

Laura

On 20 Jul 2020, at 9:05 am, Bryce Kille notifications@github.com<mailto:notifications@github.com> wrote:

Hi Jelila,

Unfortunately I am currently unable to replicate this behavior. My next suggestion would be to try building the software from source. Let me know if this works for you!

Best,

Bryce

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/marbl/parsnp/issues/80#issuecomment-660723184, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOZMQW5NCEKBOCZX7IS4OULR4N34ZANCNFSM4OPQQG5A.

bkille commented 4 years ago

@jellila No worries :slightly_smiling_face: The instructions from building from source are in the README. You will need an openMP compatible compiler installed.

I will be working on making a binary that works for you this week, though, in case you have trouble building from source. I will update you at the end of the week.

jellila commented 4 years ago

Hi Bryce,

Thank you very much! You are so kind!

Best,

Laura

On 28 Jul 2020, at 5:07 am, Bryce Kille notifications@github.com<mailto:notifications@github.com> wrote:

@jellilahttps://github.com/jellila No worries 🙂 The instructions from building from source are in the README. You will need an openMP compatible compiler installed.

I will be working on making a binary that works for you this week, though, in case you have trouble building from source. I will update you at the end of the week.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/marbl/parsnp/issues/80#issuecomment-664583222, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOZMQW7CSUG5ORFHQE2V4SDR5XGA5ANCNFSM4OPQQG5A.

bkille commented 4 years ago

Laura,

I've built a new release for parsnp, available here. It should also be available on bioconda sometime this week (whenever they accept the PR).

Let me know if the issue persists with that build.

Best,

Bryce

bkille commented 4 years ago

@jellila

The new binary is available on conda. Running conda update parsnp should give you the new version.

jellila commented 4 years ago

Hi Bryce,

Thank you very much, I will try it and let you know what happens as soon as possible! I am honestly and deeply grateful for your help, you have been really kind. Thanks!

Best,

Laura

On 8 Aug 2020, at 1:37 pm, Bryce Kille notifications@github.com<mailto:notifications@github.com> wrote:

@jellilahttps://github.com/jellila

The new binary is available on conda. Running conda update parsnp should give you the new version.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/marbl/parsnp/issues/80#issuecomment-670817718, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOZMQW6D5ALFU2WZZK5BN6DR7TCBHANCNFSM4OPQQG5A.

jellila commented 4 years ago

Hi Brice,

I am so sorry to disturb you again. I tried the new parsnp version. Firstly, I tried to update it using conda update parsnp: the package was updated, but the analysis with my sequences was still giving me an error. Therefore, I tried to remove the whole environment and reinstall it. It still gives me an error. The “fun” fact is that I am not able to perform the same analyses that I managed to do some months ago :( here is what I did.

(base) laura@d-i184-57-200 ~ % conda remove -n htools --all

Remove all packages in environment /Users/laura/miniconda3/envs/htools:

Package Plan

environment location: /Users/laura/miniconda3/envs/htools

The following packages will be REMOVED:

boost-1.70.0-py38hbf1eeb5_1 boost-cpp-1.70.0-hef959ae_3 bzip2-1.0.8-haf1e3a3_2 ca-certificates-2020.6.20-hecda079_0 capnproto-0.6.1-h0ceac7d_2 certifi-2020.6.20-py38h32f6830_0 fastani-1.31-he69ab0f_0 fasttree-2.1.10-h0b31af3_4 gsl-2.6-ha2d443c_0 harvesttools-1.2-h1341992_0 icu-67.1-h4a8c4bd_0 libblas-3.8.0-14_openblas libcblas-3.8.0-14_openblas libcxx-10.0.1-h5f48129_0 libffi-3.2.1-1 libgfortran-4.0.0-2 liblapack-3.8.0-14_openblas libopenblas-0.3.7-h3d69b6c_4 llvm-openmp-8.0.1-h770b8ee_0 lz4-c-1.9.2-h4a8c4bd_1 mash-2.2.2-h194473e_2 ncurses-6.2-hb1e8313_1 numpy-1.19.1-py38h598c1e0_0 openmp-8.0.1-0 openssl-1.1.1g-haf1e3a3_1 parsnp-1.5.3-h7475705_0 phipack-1.1-h01d97ff_0 pip-20.2.2-py_0 python-3.8.5-h85f3143_2_cpython python_abi-3.8-1_cp38 raxml-8.2.12-h0b31af3_2 readline-8.0-h0678c8f_2 setuptools-49.3.2-py38h32f6830_0 sqlite-3.32.3-h93121df_1 tk-8.6.10-hbbe82c9_0 wheel-0.34.2-py_1 xz-5.2.5-haf1e3a3_1 zlib-1.2.11-1007 zstd-1.4.5-h0384e3a_2

Proceed ([y]/n)? y

Preparing transaction: done Verifying transaction: done Executing transaction: done (base) laura@d-i184-57-200 ~ % conda create -n htools Collecting package metadata (current_repodata.json): done Solving environment: done

Package Plan

environment location: /Users/laura/miniconda3/envs/htools

Proceed ([y]/n)? y

Preparing transaction: done Verifying transaction: done Executing transaction: done #

To activate this environment, use

#

$ conda activate htools

#

To deactivate an active environment, use

#

$ conda deactivate

(base) laura@d-i184-57-200 ~ % conda install -n htools parsnp Collecting package metadata (current_repodata.json): done Solving environment: done

Package Plan

environment location: /Users/laura/miniconda3/envs/htools

added / updated specs:

The following NEW packages will be INSTALLED:

boost conda-forge/osx-64::boost-1.70.0-py38hbf1eeb5_1 boost-cpp conda-forge/osx-64::boost-cpp-1.70.0-hef959ae_3 bzip2 conda-forge/osx-64::bzip2-1.0.8-haf1e3a3_2 ca-certificates conda-forge/osx-64::ca-certificates-2020.6.20-hecda079_0 capnproto conda-forge/osx-64::capnproto-0.6.1-h0ceac7d_2 certifi conda-forge/osx-64::certifi-2020.6.20-py38h32f6830_0 fastani bioconda/osx-64::fastani-1.31-he69ab0f_0 fasttree bioconda/osx-64::fasttree-2.1.10-h0b31af3_4 gsl conda-forge/osx-64::gsl-2.6-ha2d443c_0 harvesttools bioconda/osx-64::harvesttools-1.2-h1341992_0 icu conda-forge/osx-64::icu-67.1-h4a8c4bd_0 libblas conda-forge/osx-64::libblas-3.8.0-14_openblas libcblas conda-forge/osx-64::libcblas-3.8.0-14_openblas libcxx conda-forge/osx-64::libcxx-10.0.1-h5f48129_0 libffi bioconda/osx-64::libffi-3.2.1-1 libgfortran conda-forge/osx-64::libgfortran-4.0.0-2 liblapack conda-forge/osx-64::liblapack-3.8.0-14_openblas libopenblas conda-forge/osx-64::libopenblas-0.3.7-h3d69b6c_4 llvm-openmp conda-forge/osx-64::llvm-openmp-8.0.1-h770b8ee_0 lz4-c conda-forge/osx-64::lz4-c-1.9.2-h4a8c4bd_1 mash bioconda/osx-64::mash-2.2.2-h194473e_2 ncurses conda-forge/osx-64::ncurses-6.2-hb1e8313_1 numpy conda-forge/osx-64::numpy-1.19.1-py38h598c1e0_0 openmp conda-forge/osx-64::openmp-8.0.1-0 openssl conda-forge/osx-64::openssl-1.1.1g-haf1e3a3_1 parsnp bioconda/osx-64::parsnp-1.5.3-h7475705_0 phipack bioconda/osx-64::phipack-1.1-h01d97ff_0 pip conda-forge/noarch::pip-20.2.2-py_0 python conda-forge/osx-64::python-3.8.5-h85f3143_2_cpython python_abi conda-forge/osx-64::python_abi-3.8-1_cp38 raxml bioconda/osx-64::raxml-8.2.12-h0b31af3_2 readline conda-forge/osx-64::readline-8.0-h0678c8f_2 setuptools conda-forge/osx-64::setuptools-49.3.2-py38h32f6830_0 sqlite conda-forge/osx-64::sqlite-3.32.3-h93121df_1 tk conda-forge/osx-64::tk-8.6.10-hbbe82c9_0 wheel conda-forge/noarch::wheel-0.34.2-py_1 xz conda-forge/osx-64::xz-5.2.5-haf1e3a3_1 zlib conda-forge/osx-64::zlib-1.2.11-1007 zstd conda-forge/osx-64::zstd-1.4.5-h0384e3a_2

Proceed ([y]/n)? y

Preparing transaction: done Verifying transaction: done Executing transaction: done (base) laura@d-i184-57-200 ~ % conda activate htools (htools) laura@d-i184-57-200 ~ % /Users/laura/miniconda3/envs/htools/bin/parsnp -r /Users/laura/Desktop/BIOINFORMATICS/P.profundum.fasta -d /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp -o /Users/laura/Desktop/parsnp-new -c |--Parsnp 1.5.3--| For detailed documentation please see --> http://harvest.readthedocs.org/en/latest 14:27:46 - INFO -


SETTINGS: |-refgenome: /Users/laura/Desktop/BIOINFORMATICS/P.profundum.fasta |-genomes: /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/MT1415.fasta /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/KC-Na-NB1.fasta ...45 more file(s)... /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/89dp-OG16.fasta /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/QMA0506.fasta |-aligner: muscle |-outdir: /Users/laura/Desktop/parsnp-new |-OS: Darwin |-threads: 1


14:27:46 - INFO - <> 14:27:46 - INFO - No genbank file provided for reference annotations, skipping.. 14:27:46 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/MT1415.fasta is 1.65x shorter than reference genome! 14:27:46 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/A-162.fasta is 1.47x shorter than reference genome! 14:27:46 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/JM-2017.fasta is 1.43x shorter than reference genome! Traceback (most recent call last): File "/Users/laura/miniconda3/envs/htools/bin/parsnp", line 819, in hdr = ff.readline() File "/Users/laura/miniconda3/envs/htools/lib/python3.8/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 23: invalid start byte (htools) laura@d-i184-57-200 ~ % /Users/laura/miniconda3/envs/htools/bin/parsnp -r /Users/laura/Desktop/BIOINFORMATICS/QMA0509.fasta -d /Users/laura/Desktop/BIOINFORMATICS/Pdp -o /Users/laura/Desktop/parsnp-pdp -c |--Parsnp 1.5.3--| For detailed documentation please see --> http://harvest.readthedocs.org/en/latest 14:30:32 - INFO -


SETTINGS: |-refgenome: /Users/laura/Desktop/BIOINFORMATICS/QMA0509.fasta |-genomes: /Users/laura/Desktop/BIOINFORMATICS/Pdp/MT1415.fasta /Users/laura/Desktop/BIOINFORMATICS/Pdp/.DS_Store ...7 more file(s)... /Users/laura/Desktop/BIOINFORMATICS/Pdp/ATCC29689.fasta /Users/laura/Desktop/BIOINFORMATICS/Pdp/QMA0506.fasta |-aligner: muscle |-outdir: /Users/laura/Desktop/parsnp-pdp |-OS: Darwin |-threads: 1


14:30:32 - INFO - <> 14:30:32 - INFO - No genbank file provided for reference annotations, skipping.. Traceback (most recent call last): File "/Users/laura/miniconda3/envs/htools/bin/parsnp", line 819, in hdr = ff.readline() File "/Users/laura/miniconda3/envs/htools/lib/python3.8/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte

The first “pdp-pdd” is the analysis I am trying to do now, whereas for the second “pdp” I was using the same genomes I successfully used some months ago.

Can it be something related to the Unicode/ASCII?

Sorry :(

Best regards,

Laura

On 10 Aug 2020, at 11:16 am, Laura Baseggio l.baseggio@uq.net.au<mailto:l.baseggio@uq.net.au> wrote:

Hi Bryce,

Thank you very much, I will try it and let you know what happens as soon as possible! I am honestly and deeply grateful for your help, you have been really kind. Thanks!

Best,

Laura

On 8 Aug 2020, at 1:37 pm, Bryce Kille notifications@github.com<mailto:notifications@github.com> wrote:

@jellilahttps://github.com/jellila

The new binary is available on conda. Running conda update parsnp should give you the new version.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/marbl/parsnp/issues/80#issuecomment-670817718, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOZMQW6D5ALFU2WZZK5BN6DR7TCBHANCNFSM4OPQQG5A.

bkille commented 4 years ago

@jellila I believe the issue here is the .DS_Store file in your directory. You can get around including this file by using

-d /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/*.fasta

instead of

-d /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/
jellila commented 4 years ago

Yeeeeyyyy it worked!!! Thank you so much Bryce!!! :)

On 14 Aug 2020, at 8:49 am, Bryce Kille notifications@github.com<mailto:notifications@github.com> wrote:

@jellilahttps://github.com/jellila I believe the issue here is the .DS_Store file in your directory. You can get around including this file by using

-d /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/*.fasta

instead of

-d /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/marbl/parsnp/issues/80#issuecomment-673747613, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOZMQWZMLPKJ2CSGUREGUWTSARUX7ANCNFSM4OPQQG5A.

Arthurdyu commented 5 months ago

I have a similar problem and I don't konw how to solve it ,sos!!!!! `(parsnp) [wanglamei@login02 ~]$ parsnp -r /share/home/wanglamei/Data/Bacillus/40species/GCA_000832605_ref.fasta -d /share/home/wanglamei/Data/Bacillus/40species/*.fasta -o /share/home/wanglamei/ParSNP/40species/condaV2 -F 16:44:04 - INFO - |--Parsnp 2.0.5--|

16:44:04 - WARNING - Output directory /share/home/wanglamei/ParSNP/40species/condaV2 exists, all results will be overwritten 16:44:04 - INFO -


SETTINGS: |-refgenome: /share/home/wanglamei/Data/Bacillus/40species/GCA_000832605_ref.fasta |-genomes: /share/home/wanglamei/Data/Bacillus/40species/GCA_000262045.1_KCTC_13613_01_genomic.fasta /share/home/wanglamei/Data/Bacillus/40species/GCA_000712595.1_ASM71259v1_genomic.fasta ...36 more file(s)... /share/home/wanglamei/Data/Bacillus/40species/GCA_037907705.1_ASM3790770v1_genomic.fasta /share/home/wanglamei/Data/Bacillus/40species/GCA_900177005.1_Bcereus.16-00174_genomic.fasta |-aligner: muscle |-outdir: /share/home/wanglamei/ParSNP/40species/condaV2 |-OS: Linux |-threads: 1


16:44:04 - INFO - <> 16:44:04 - INFO - No genbank file provided for reference annotations, skipping.. 16:44:04 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_000262045.1_KCTC_13613_01_genomic.fasta is 1.49x shorter than reference genome! Skipping... 16:44:04 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_000712595.1_ASM71259v1_genomic.fasta is 1.21x shorter than reference genome! Skipping... 16:44:04 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_000769555.1_ASM76955v1_genomic.fasta is 1.40x shorter than reference genome! Skipping... 16:44:04 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001278705.1_ASM127870v1_genomic.fasta is 1.23x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001307105.1_ASM130710v1_genomic.fasta is 1.54x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001517105.1_ASM151710v1_genomic.fasta is 1.38x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001584325.1_ASM158432v1_genomic.fasta is 1.53x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001857925.1_ASM185792v1_genomic.fasta is 1.55x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_002250945.2_ASM225094v2_genomic.fasta is 1.35x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_002993925.1_ASM299392v1_genomic.fasta is 1.29x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_003148415.1_ASM314841v1_genomic.fasta is 1.30x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_004124315.2_ASM412431v2_genomic.fasta is 1.37x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_006094475.1_ASM609447v1_genomic.fasta is 1.39x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_008244765.1_ASM824476v1_genomic.fasta is 1.49x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_014042035.1_ASM1404203v1_genomic.fasta is 1.33x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_017832095.1_ASM1783209v1_genomic.fasta is 1.23x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_029772985.1_ASM2977298v1_genomic.fasta is 1.32x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_030908325.1_ASM3090832v1_genomic.fasta is 1.25x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_031195075.1_ASM3119507v1_genomic.fasta is 1.54x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_031316515.1_ASM3131651v1_genomic.fasta is 1.52x shorter than reference genome! Skipping... 16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_031317525.1_ASM3131752v1_genomic.fasta is 1.40x shorter than reference genome! Skipping... 16:44:05 - INFO - Recruiting genomes... /share/home/wanglamei/ParSNP/40species/condaV216:45:12 - INFO - Too few genomes to run partitions of size >50. Running all genomes at once. 16:45:12 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner... ^[[B^[[B^[[B^[[B ^C16:46:36 - CRITICAL - Caught request to terminate by user (CTRL+C), exiting now, bye (parsnp) [wanglamei@login02 ~]$ parsnp -r /project/TBAMR03/wanglm/Bacillus/40species/GCA_031316625.1_ASM3131662v1_genomic.fna^C (parsnp) [wanglamei@login02 ~]$ parsnp -r /share/home/wanglamei/Data/Bacillus/40species/GCA_031316625.1_ASM3131662v1_genomic.fasta ^C (parsnp) [wanglamei@login02 ~]$ ^C (parsnp) [wanglamei@login02 ~]$ cd /share/home/wanglamei/ParSNP/40species/condaV2 (parsnp) [wanglamei@login02 condaV2]$ /share/home/wanglamei/ParSNP/40species/condaV2^C (parsnp) [wanglamei@login02 condaV2]$ parsnp -r /share/home/wanglamei/Data/Bacillus/40species/GCA_000832605_ref.fasta -d /share/home/wanglamei/Data/Bacillus/40species/.fasta -o /share/home/wanglamei/ParSNP/40species/condaV2 -F ^C (parsnp) [wanglamei@login02 condaV2]$ parsnp -r /share/home/wanglamei/Data/Bacillus/40species/GCA_031316625.1_ASM3131662v1_genomic.fasta -d /share/home/wanglamei/Data/Bacillus/40species/.fasta -o output 16:51:40 - INFO - |--Parsnp 2.0.5--|

16:51:40 - INFO -


SETTINGS: |-refgenome: /share/home/wanglamei/Data/Bacillus/40species/GCA_031316625.1_ASM3131662v1_genomic.fasta |-genomes: /share/home/wanglamei/Data/Bacillus/40species/GCA_000262045.1_KCTC_13613_01_genomic.fasta /share/home/wanglamei/Data/Bacillus/40species/GCA_000712595.1_ASM71259v1_genomic.fasta ...36 more file(s)... /share/home/wanglamei/Data/Bacillus/40species/GCA_037907705.1_ASM3790770v1_genomic.fasta /share/home/wanglamei/Data/Bacillus/40species/GCA_900177005.1_Bcereus.16-00174_genomic.fasta |-aligner: muscle |-outdir: output |-OS: Linux |-threads: 1


16:51:40 - INFO - <> 16:51:40 - INFO - No genbank file provided for reference annotations, skipping.. 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_000262045.1_KCTC_13613_01_genomic.fasta is 1.63x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_000712595.1_ASM71259v1_genomic.fasta is 1.32x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_000769555.1_ASM76955v1_genomic.fasta is 1.53x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001278705.1_ASM127870v1_genomic.fasta is 1.34x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001307105.1_ASM130710v1_genomic.fasta is 1.69x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001517105.1_ASM151710v1_genomic.fasta is 1.51x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001584325.1_ASM158432v1_genomic.fasta is 1.67x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001857925.1_ASM185792v1_genomic.fasta is 1.69x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_002250945.2_ASM225094v2_genomic.fasta is 1.48x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_002993925.1_ASM299392v1_genomic.fasta is 1.41x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_003148415.1_ASM314841v1_genomic.fasta is 1.42x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_004103615.1_ASM410361v1_genomic.fasta is 1.28x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_004124315.2_ASM412431v2_genomic.fasta is 1.49x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_006094475.1_ASM609447v1_genomic.fasta is 1.52x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_008244765.1_ASM824476v1_genomic.fasta is 1.63x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_014042035.1_ASM1404203v1_genomic.fasta is 1.45x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_017832095.1_ASM1783209v1_genomic.fasta is 1.35x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_029772985.1_ASM2977298v1_genomic.fasta is 1.44x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_030908325.1_ASM3090832v1_genomic.fasta is 1.36x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_031195075.1_ASM3119507v1_genomic.fasta is 1.68x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_031316515.1_ASM3131651v1_genomic.fasta is 1.66x shorter than reference genome! Skipping... 16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_031317525.1_ASM3131752v1_genomic.fasta is 1.53x shorter than reference genome! Skipping... 16:51:40 - INFO - Recruiting genomes... 16:53:37 - INFO - Too few genomes to run partitions of size >50. Running all genomes at once. 16:53:37 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner... 16:55:27 - WARNING - Aligned regions cover less than 10% of reference genome! Please verify recruited genomes are all strain of interest 16:55:30 - INFO - Reconstructing core genome phylogeny... 16:55:58 - INFO - Aligned 18 genomes in 4.30 minutes 16:55:58 - INFO - Parsnp finished! All output available in output`

bkille commented 5 months ago

Hi @Arthurdyu!

If you run parsnp with the --curated flag, it will include all of the input sequences in the alignment, regardless of length or sequence similarity.

I'll be sure to add another option in the next release that will allow you to skip the length filter but still do the similarity filter, but in the meantime the --curated should do the trick.