matsen / pplacer

Phylogenetic placement and downstream analysis
http://matsen.fredhutch.org/pplacer/
GNU General Public License v3.0
75 stars 18 forks source link

Invalid number when parsing JSON ouput #331

Closed gjospin closed 10 years ago

gjospin commented 10 years ago

Hello, We are using pplacer inside of PhyloSift and an error recently started showing up.

Our JSON parser gives us the following error. malformed number (no digits after initial minus), at character offset 709 (before "an, -inf, 1.99999275...") at /share/eisen-z2/gjospin/Phylosift_test/bin/../lib/Phylosift/pplacer.pm line 438.

The offending field is the -nan in the following line. "p": [[0.0162720507812, 0, -nan, -inf, 1.99999275208],

Is -nan a normal output within pplacer and I should be looking at my JSON parser?

Let me know if you need input/output files and the command used.

Thank you.

matsen commented 10 years ago

This indicates that there is something unusual with your sequences. I'm not saying it's not a pplacer bug, but it could indicate that there is something wrong on your side. Could you send us an example?

On Wed, Jan 15, 2014 at 8:33 AM, Guillaume Jospin notifications@github.comwrote:

Hello, We are using pplacer inside of PhyloSift and an error recently started showing up.

Our JSON parser gives us the following error. malformed number (no digits after initial minus), at character offset 709 (before "an, -inf, 1.99999275...") at /share/eisen-z2/gjospin/Phylosift_test/bin/../lib/Phylosift/pplacer.pmline 438.

The offending field is the -nan in the following line. "p": [[0.0162720507812, 0, -nan, -inf, 1.99999275208],

Is -nan a normal output within pplacer and I should be looking at my JSON parser?

Let me know if you need input/output files and the command used.

Thank you.

— Reply to this email directly or view it on GitHubhttps://github.com/matsen/pplacer/issues/331 .

Frederick "Erick" Matsen, Assistant Member Fred Hutchinson Cancer Research Center http://matsen.fhcrc.org/

gjospin commented 10 years ago

This is the sequence that was used.

26335.1.156.278.2.57.179 ---------------HEQLLVVVRQRPLE-LQPTISHCDCRLLVRWDLQVHRN----------

The command run was the following: pplacer -o V278.updated.11.jplace --verbosity 0 -j 8 -c V278.updated V278.updated.11.fasta"

The V278.updated pplacer package can be found here: http://edhar.genomecenter.ucdavis.edu/~gjospin/V278.updated.tgz

gjospin commented 10 years ago

That was supposed to be in fasta format. The leading > was interpreted as formatting ...

matsen commented 10 years ago

First, I might suggest running pplacer with the --verbose 2 option when you run into issues. This corner case was visible from the command line.

This is happening because your reference package was made with empirical frequencies. Let's look at the resulting stationary distribution that is spit out when we ask for verbose output:

stat distn: {0.0112359550562; 0.0814606741573; 0.0524344569288; 0.0393258426966; 0.063670411985; 0.0533707865169; 0.0589887640449; 0.0196629213483; 0.00842696629213; 0.0767790262172; 0.153558052434; 0.0777153558052; 0.0215355805243; 0.0187265917603; 0.0346441947566; 0.065543071161; 0.0580524344569; 0.; 0.0177902621723; 0.0870786516854}

See the 0. up there? That's indicating that we didn't see any W's in the reference alignment. Indeed that's the case for you. And then you do have a W in your query sequence.

So you are telling pplacer to build a model assuming that what it sees in the reference alignment is representative of what it will see in the query sequences, and then you throw it a new amino acid.

Clearly, there could be a better error message here, and I do think that model amino acid frequencies are more appropriate for such a small alignment. Do you think that would be good? @koadman and @cmccoy... thoughts?

Thanks for finding this!

On Wed, Jan 15, 2014 at 9:35 AM, Guillaume Jospin notifications@github.comwrote:

That was supposed to be in fasta format. The leading > was interpreted as formatting ...

— Reply to this email directly or view it on GitHubhttps://github.com/matsen/pplacer/issues/331#issuecomment-32388786 .

Frederick "Erick" Matsen, Assistant Member Fred Hutchinson Cancer Research Center http://matsen.fhcrc.org/

cmccoy commented 10 years ago

For FastTree-based reference packages, I believe pplacer always uses empirical frequencies (at least, taxtastic sets empirical_frequencies to true). I think FastTree uses model frequencies for amino acids, though. Model frequencies or (less preferable I think) incorporating pseudocounts makes sense to me.

On Wed, Jan 15, 2014 at 5:46 PM, Erick Matsen notifications@github.comwrote:

First, I might suggest running pplacer with the --verbose 2 option when you run into issues. This corner case was visible from the command line.

This is happening because your reference package was made with empirical frequencies. Let's look at the resulting stationary distribution that is spit out when we ask for verbose output:

stat distn: {0.0112359550562; 0.0814606741573; 0.0524344569288; 0.0393258426966; 0.063670411985; 0.0533707865169; 0.0589887640449; 0.0196629213483; 0.00842696629213; 0.0767790262172; 0.153558052434; 0.0777153558052; 0.0215355805243; 0.0187265917603; 0.0346441947566; 0.065543071161; 0.0580524344569; 0.; 0.0177902621723; 0.0870786516854}

See the 0. up there? That's indicating that we didn't see any W's in the reference alignment. Indeed that's the case for you. And then you do have a W in your query sequence.

So you are telling pplacer to build a model assuming that what it sees in the reference alignment is representative of what it will see in the query sequences, and then you throw it a new amino acid.

Clearly, there could be a better error message here, and I do think that model amino acid frequencies are more appropriate for such a small alignment. Do you think that would be good? @koadman and @cmccoy... thoughts?

Thanks for finding this!

On Wed, Jan 15, 2014 at 9:35 AM, Guillaume Jospin notifications@github.comwrote:

That was supposed to be in fasta format. The leading > was interpreted as formatting ...

— Reply to this email directly or view it on GitHub< https://github.com/matsen/pplacer/issues/331#issuecomment-32388786> .

Frederick "Erick" Matsen, Assistant Member Fred Hutchinson Cancer Research Center http://matsen.fhcrc.org/

Reply to this email directly or view it on GitHubhttps://github.com/matsen/pplacer/issues/331#issuecomment-32435359 .

matsen commented 10 years ago

Thanks for your thoughts. pplacer can definitely handle model frequencies with FastTree-- I just hand-modified Guillaume's reference package and it ran through with no problems.

So I'll just go have a poke at taxtastic.

gjospin commented 10 years ago

Repackaging our pplacer reference package with the newest taxit/taxtastic stuff and using the newest pplacer build seems to have resolved the problem. Thank you!

matsen commented 10 years ago

Great.

Are you using dev? There are a collection of changes and a bugfix that are important.

https://github.com/matsen/pplacer/tree/dev

We will merge to master in not too long, but still testing.

On Thu, Feb 6, 2014 at 9:04 AM, Guillaume Jospin notifications@github.comwrote:

Repackaging our pplacer reference package with the newest taxit/taxtastic stuff and using the newest pplacer build seems to have resolved the problem. Thank you!

Reply to this email directly or view it on GitHubhttps://github.com/matsen/pplacer/issues/331#issuecomment-34345263 .

Frederick "Erick" Matsen, Assistant Member Fred Hutchinson Cancer Research Center http://matsen.fhcrc.org/

gjospin commented 10 years ago

Dev for pplacer and master for taxtastic. Should I redo using dev for taxtastic too?

Sent from my iPhone

On Feb 6, 2014, at 10:39 AM, Erick Matsen notifications@github.com wrote:

Great.

Are you using dev? There are a collection of changes and a bugfix that are important.

https://github.com/matsen/pplacer/tree/dev

We will merge to master in not too long, but still testing.

On Thu, Feb 6, 2014 at 9:04 AM, Guillaume Jospin notifications@github.comwrote:

Repackaging our pplacer reference package with the newest taxit/taxtastic stuff and using the newest pplacer build seems to have resolved the problem. Thank you!

Reply to this email directly or view it on GitHubhttps://github.com/matsen/pplacer/issues/331#issuecomment-34345263 .

Frederick "Erick" Matsen, Assistant Member Fred Hutchinson Cancer Research Center http://matsen.fhcrc.org/ — Reply to this email directly or view it on GitHub.

cmccoy commented 10 years ago

taxtastic master should be fine, as long as it's current - 0.5.2 was merged to master and released about a week ago.

On Thu, Feb 6, 2014 at 10:48 AM, Guillaume Jospin notifications@github.comwrote:

Dev for pplacer and master for taxtastic. Should I redo using dev for taxtastic too?

Sent from my iPhone

On Feb 6, 2014, at 10:39 AM, Erick Matsen notifications@github.com wrote:

Great.

Are you using dev? There are a collection of changes and a bugfix that are important.

https://github.com/matsen/pplacer/tree/dev

We will merge to master in not too long, but still testing.

On Thu, Feb 6, 2014 at 9:04 AM, Guillaume Jospin notifications@github.comwrote:

Repackaging our pplacer reference package with the newest taxit/taxtastic stuff and using the newest pplacer build seems to have resolved the problem. Thank you!

Reply to this email directly or view it on GitHub< https://github.com/matsen/pplacer/issues/331#issuecomment-34345263> .

Frederick "Erick" Matsen, Assistant Member Fred Hutchinson Cancer Research Center

http://matsen.fhcrc.org/

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHubhttps://github.com/matsen/pplacer/issues/331#issuecomment-34356207 .

matsen commented 10 years ago

Thanks, Connor.

Take a quick look at the docs concerning this latest change to pplacer before using it: https://github.com/matsen/pplacer/commit/bcb8eed93b2141b98c3dceb073afcfb30fa88536

It's a substantial change.

On Thu, Feb 6, 2014 at 10:52 AM, Connor McCoy notifications@github.comwrote:

taxtastic master should be fine, as long as it's current - 0.5.2 was merged to master and released about a week ago.

On Thu, Feb 6, 2014 at 10:48 AM, Guillaume Jospin

notifications@github.comwrote:

Dev for pplacer and master for taxtastic. Should I redo using dev for taxtastic too?

Sent from my iPhone

On Feb 6, 2014, at 10:39 AM, Erick Matsen notifications@github.com wrote:

Great.

Are you using dev? There are a collection of changes and a bugfix that are important.

https://github.com/matsen/pplacer/tree/dev

We will merge to master in not too long, but still testing.

On Thu, Feb 6, 2014 at 9:04 AM, Guillaume Jospin notifications@github.comwrote:

Repackaging our pplacer reference package with the newest taxit/taxtastic stuff and using the newest pplacer build seems to have resolved the problem. Thank you!

Reply to this email directly or view it on GitHub< https://github.com/matsen/pplacer/issues/331#issuecomment-34345263> .

Frederick "Erick" Matsen, Assistant Member Fred Hutchinson Cancer Research Center

http://matsen.fhcrc.org/

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHub< https://github.com/matsen/pplacer/issues/331#issuecomment-34356207>

.

Reply to this email directly or view it on GitHubhttps://github.com/matsen/pplacer/issues/331#issuecomment-34356623 .

Frederick "Erick" Matsen, Assistant Member Fred Hutchinson Cancer Research Center http://matsen.fhcrc.org/