Closed alaiacano closed 12 years ago
can you reproduce this outside of d3py?
I'm not getting serialze errors with
json.dumps({"a": long(2)})
so I'm guessing it must be something else... Can you print the dictionary that throws the error?
Yeah here it is. The problem seems to be that pandas/json aren't playing nice.
In [33]: df
Out[33]:
cnt time
0 0 1326825168
1 1 1325556432
2 2 1324741091
3 3 1321997712
4 4 1320448341
5 5 1317698032
6 6 1317693809
7 7 1317582762
8 8 1317581030
In [34]: type(df.values[0][0])
Out[34]: numpy.int64
In [35]: json.dumps({'cnt':np.int64(0), 'value' : np.int64(10)})
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (20, 0))
Pandas stores fixed point numbers as 64 bit ints, which json.dumps seems to have a problem with. According to googling, JavaScript doesn't do 64 bit integers anyway.
I think the best thing to do is throw a custom exception on this line notifying the user that int64's won't work, and also let Wes know about the issue.
Do we have a custom exception?
Don't think we have a custom exception.
Someone is writing a dataframe -> JSON parser (https://github.com/wesm/pandas/issues/631) so we shouldn't sink too much effort into this. That said, can't we just cast the array contents to a list of floats on the way out? I'll have a crack at this in a minute.
I have 50 minutes before climbing, and just wasted an hour trying to get pep8 working in textmate.
M
On 22 Jan 2012, at 08:59, Adam Laiacano wrote:
Yeah here it is. The problem seems to be that pandas/json aren't playing nice.
In [33]: df Out[33]: cnt time
0 0 1326825168 1 1 1325556432 2 2 1324741091 3 3 1321997712 4 4 1320448341 5 5 1317698032 6 6 1317693809 7 7 1317582762 8 8 1317581030In [34]: type(df.values[0][0]) Out[34]: numpy.int64
In [35]: json.dumps({'cnt':np.int64(0), 'value' : np.int64(10)}) ERROR: An unexpected error occurred while tokenizing input The following traceback may be corrupted or invalid The error message is: ('EOF in multi-line statement', (20, 0))
Pandas stores fixed point numbers as 64 bit ints, which json.dumps seems to have a problem with. According to googling, JavaScript doesn't do 64 bit integers anyway.
I think the best thing to do is throw a custom exception on this line notifying the user that int64's won't work, and also let Wes know about the issue.
Do we have a custom exception?
Reply to this email directly or view it on GitHub: https://github.com/mikedewar/d3py/issues/20#issuecomment-3603135
so I think this is totally solved by ujson, which is faster, too.
Will commit the change!
M
On 22 Jan 2012, at 08:59, Adam Laiacano wrote:
Yeah here it is. The problem seems to be that pandas/json aren't playing nice.
In [33]: df Out[33]: cnt time
0 0 1326825168 1 1 1325556432 2 2 1324741091 3 3 1321997712 4 4 1320448341 5 5 1317698032 6 6 1317693809 7 7 1317582762 8 8 1317581030In [34]: type(df.values[0][0]) Out[34]: numpy.int64
In [35]: json.dumps({'cnt':np.int64(0), 'value' : np.int64(10)}) ERROR: An unexpected error occurred while tokenizing input The following traceback may be corrupted or invalid The error message is: ('EOF in multi-line statement', (20, 0))
Pandas stores fixed point numbers as 64 bit ints, which json.dumps seems to have a problem with. According to googling, JavaScript doesn't do 64 bit integers anyway.
I think the best thing to do is throw a custom exception on this line notifying the user that int64's won't work, and also let Wes know about the issue.
Do we have a custom exception?
Reply to this email directly or view it on GitHub: https://github.com/mikedewar/d3py/issues/20#issuecomment-3603135
So ujson seems not to be working now, and is chucking weird errors.
I'm just floating everything from now on. I hope that the pandas guys make a good json converter...
M
On 22 Jan 2012, at 08:59, Adam Laiacano wrote:
Yeah here it is. The problem seems to be that pandas/json aren't playing nice.
In [33]: df Out[33]: cnt time
0 0 1326825168 1 1 1325556432 2 2 1324741091 3 3 1321997712 4 4 1320448341 5 5 1317698032 6 6 1317693809 7 7 1317582762 8 8 1317581030In [34]: type(df.values[0][0]) Out[34]: numpy.int64
In [35]: json.dumps({'cnt':np.int64(0), 'value' : np.int64(10)}) ERROR: An unexpected error occurred while tokenizing input The following traceback may be corrupted or invalid The error message is: ('EOF in multi-line statement', (20, 0))
Pandas stores fixed point numbers as 64 bit ints, which json.dumps seems to have a problem with. According to googling, JavaScript doesn't do 64 bit integers anyway.
I think the best thing to do is throw a custom exception on this line notifying the user that int64's won't work, and also let Wes know about the issue.
Do we have a custom exception?
Reply to this email directly or view it on GitHub: https://github.com/mikedewar/d3py/issues/20#issuecomment-3603135
Exciting!
M
Begin forwarded message:
From: Brian Granger ellisonbg@gmail.com Date: 15 February 2012 20:38:47 EST To: Wes McKinney wesmckinn@gmail.com Cc: Fernando Perez Fernando.Perez@berkeley.edu, mikedewar@gmail.com Subject: Re: Playing with Pandas and the notebook Reply-To: ellisonbg@gmail.com
OK here is a sketch of how d3 can work with the notebook:
https://gist.github.com/1840631
Mike, your library looks very nice! and it should be very simple to integrate this with IPython's notebook. This would allow everything to "just work" without running a separate server. Some points about this integration:
- We have a Javascript object in IPython that has some of this logic already. But it doesn't support loading js libs or css files. We should add this logic to the base Javascript class so it can be used by subclasses.
- I don't think the d3 integration with ipython should write anything to disk. We would like display logic code like this to be free of disk side effects if at all possible. The graph example I am using shows how to load json files in the cwd, but in real code the json string should just be put directly into the js code.
- All styling should be done using javascript so we don't have to load css from disk.
- I highly recommend using jinja2 for the templating. We will be moving that direction ourselves.
Cheers,
Brian
On Wed, Feb 15, 2012 at 12:01 PM, Wes McKinney wesmckinn@gmail.com wrote:
Regarding d3 integration, Mike Dewar (cc'd) would be very interested, as he and some of the bit.ly folks have been working on a Grammar of Graphics / ggplot2-inspired d3 interface for Python using pandas for its data handling: https://github.com/mikedewar/D3py. I haven't had a chance to play with it yet.
On Sun, Feb 12, 2012 at 9:48 PM, Brian Granger ellisonbg@gmail.com wrote:
Fernando,
On Fri, Feb 10, 2012 at 1:22 PM, Fernando Perez Fernando.Perez@berkeley.edu wrote:
Hey,
On Thu, Feb 9, 2012 at 21:18, Brian Granger ellisonbg@gmail.com wrote:
I guess I am not too worried about this...famous last words. I just go back to the point that the most dangerous thing about IPython is that we allow the execution of arbitrary python code :-) I truly don't think that allowing the execution of arbitrary JS code is any different. The fact that there could be external malicious code doesn't really matter. I could just as equally give you a python script that downloads malicious python code and run it. Arbitrary code is arbitrary code and the bottom line is that no one should run code they don't completely trust. These things are especially true when folks are running their own notebook servers. There are other issues that come up if a notebook server was run in the cloud. Then you have to start worrying that bad guys would use notebooks to launch hostile attacks on other systems.
On the one hand, I totally agree with you, and that's been our message for a long time: "IPython runs arbitrary code, deal with it". What I think is new here, is how there may be execution of arbitrary code when the user doesn't even realize he's taken an action that executes code.
See below, for more on this, but there is only one way to execute code in the notebook: shift enter.
It's one thing to say: "if you do %run evil.py, good luck to you". It's another to have a user open an .ipynb that a colleague emailed them, which pulls some JS library from a CDN that last night happened to get hacked by a particularly nasty idiot who decided it would be fun to put in:
IPython.notebook.kernel.execute('os.system("rm -rf $HOME")')
. If code like this gets loaded and run on file open, perfectly reasonable actions like opening a notebook from a trusted colleague could have devastating consequences.About the hacked CDN network: we already trust remotely hosted code every single day: github. There is a huge chain of trust: other devs who have r/w permissions on the repos we use, the github infrastructure, the dns system.
The current implementation does not execute any code, python or js, upon file open. I took great care to make sure that this was the case. Give it a shot by creating a cell with the following:
from IPython.core.display import Javascript Javascript("alert('hi')")
Save the notebook, close it, then reopen it. You won't see the alert box appear when you reopen the notebook. Instead you will see the plain text repr of the Javascript object.
I agree that it would be a bad thing to run any code on loading the notebook.
This is the scenario that I'm worrying about, which I think is new and different enough from what IPython had up until now to merit consideration. We obviously do want to move forward with a very dynamic and rich JS model, we'll just need to figure out how not to make IPython the best attack vector for local compromises on the entire internet :)
In my mind the only new thing about the notebook and the new Javascript support is that shift-enter can lead to the execution of arbitrary python AND javascript code. The new part is Javascript.
The other thing to keep in mind is the indirection issue. If an attacker wants to attack someone using the notebook, the simplest things for them to do would be to put some nasty logic in a python cell:
import funlibrary funlibrary.compute() # bad things happen here - this could load python code from the net and run it.
To attack using the new Javascript handling requires multiple layers of indirection:
- Python code/object that returns a javascript representation.
- Javascript code that does something bad by...
- Using Javascript to make calls back to the server/kernel.
If you really want to do bad things, why not just write the bad python code to begin with? The extra layers of indirection are completely un-needed. The only thing that a javascript attack involves is obfuscation through indirection. Maybe that is significant though.
Cheers,
Brian
Cheers,
f
Brian E. Granger Cal Poly State University, San Luis Obispo bgranger@calpoly.edu and ellisonbg@gmail.com
Brian E. Granger Cal Poly State University, San Luis Obispo bgranger@calpoly.edu and ellisonbg@gmail.com
I tried plotting something with x-axis data of type
long
and it gave me the following error on line 164 ofThe line that threw the error is: https://github.com/mikedewar/D3py/blob/master/d3py/d3py.py#L164