Unable to read dataframe on colab #212

Closed jainayush007 closed 4 years ago

jainayush007 commented 4 years ago

Hi. I am unable to read pandas data frame into d-tale. Below is the error -

JSONDecodeError Traceback (most recent call last)

in () 7 dtale_app.USE_NGROK = True 8 ----> 9

4 frames /usr/lib/python3.6/json/ in raw_decode(self, s, idx) 355 obj, end = self.scan_once(s, idx) 356 except StopIteration as err: --> 357 raise JSONDecodeError("Expecting value", s, err.value) from None 358 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Screenshot of error-


df was loaded from a file that can be fetched from below link - ''

aschonfeld commented 4 years ago

So I'm not sure how you're loading your data but I was able to load it fine using the following:

import dtale
import as dtale_app

dtale_app.USE_NGROK = True

url = ''

I'm also working on adding the ability to specify the parameter index_col so you won't get you first column as "Unnamed: 0"

So then your call would be: dtale.show_csv(path=url, index_col=0)

FYI, under the hood show_csv is running this code:

import pandas as pd
import requests
from six import PY3, BytesIO, StringIO

def show_csv(**kwargs):
    path = kwargs.pop("path")
    if path.startswith("http://") or path.startswith(
    ):  # add support for URLs
        proxy = kwargs.pop("proxy", None)
        req_kwargs = {}
        if proxy is not None:
            req_kwargs["proxies"] = dict(http=proxy, https=proxy)
        resp = requests.get(path, **req_kwargs)
        assert resp.status_code == 200
        path = BytesIO(resp.content) if PY3 else StringIO(resp.content.decode("utf-8"))
    return pd.read_csv(path, **kwargs)
jainayush007 commented 4 years ago

Thanks for the response. I had loaded it into a Koalas dataframe and just performed to_pandas.


post this I performed : pdf = kdf.toPandas() and remaining script is already part of my issue description. So:

  1. .toPandas() df conversion from koalas to Pandas works but the isnt usable with d-tale.
  2. If I would have read the data directly with Pandas(prefer to use koalas; which will be useful for large datasets), then too I would face same issue?
jainayush007 commented 4 years ago

Also, I was unable to replicate your successful scenario of being able to load the data. Anything missing? -


aschonfeld commented 4 years ago

Interesting that show_csv didn't work for you. It worked fine for me locally. With & without the index_col parameter added. Can you try using that show_csv function I included in my previous comment? Just to see if that loads the data?

jainayush007 commented 4 years ago

Intrestingly, that worked!


aschonfeld commented 4 years ago

Hmm, well you could always do Using dtale.show_csv worked for me with v1.9.0 🤷‍♂️

jainayush007 commented 4 years ago

I am on v1.9.0 too and it still didnt work for me. I am on colab -


jainayush007 commented 4 years ago

I believe this issue should be re-opened.

aschonfeld commented 4 years ago

This worked fine for me in google colab: image

Here is are the versions of all my packages installed are (just run !pip freeze to see what versions you have):

jainayush007 commented 4 years ago

So, these are the difference found -

plotly==4.4.1 plotly==4.8.2
pyarrow==0.14.1 pyarrow==0.15.1

I need to upgrade plotly, so that can enable pandas backend usage as plotly and pyarrow upgrade for Koalas.

!pip install -U plotly !pip install pyarrow==0.15.1

aschonfeld commented 4 years ago

But by downgrading those did it fix it? The issue seems to be with some sort of dependency. I dont have anything pinned in D-Tale so i’m kind of at the mercy of people’s environments.

Honestly, it seems like some sort of character encoding problem which seems odd since you’re using google colab and I should see the same issue too. Which version of python are you using? I’m using 3.6 i believe

aschonfeld commented 4 years ago

Is there any chance theres more to the stacktrace? I see it says “4 frames” on your screenshot. It might be a little easier to debug if I can see where the issue originates.

You can also try doing: d = d._main_url

Then seeing if you can access that link. It might be an jupyter problem

jainayush007 commented 4 years ago

This is interesting, I just instantiated my colab notebook and it ran the script again. I got a new error this time -


I could successfully load the file through koalas and convert to a pyspark and pandas df. I was later able to load pandas and koalas dataframe to dtale as well -


No difference in any libraries between my last and current results of !pip freeze

aschonfeld commented 4 years ago

The first error, i’m assuming, is because you hadn’t loaded the show_csv function into memory yet. Can you try executing the cell with the show_csv definition and then running the code again?

TanushGoel commented 4 years ago

You can always download it in one line via a "wget" shell command. Then use pandas to parse and head the file.

Screen Shot 2020-07-05 at 5 35 28 AM
jainayush007 commented 4 years ago

@TanushGoel - Thanks for the tip! Are there any intrinsic benefits(memory?, storage?) of loading in pandas. I believe it loads the file in google colab vm which shouldn't impact my 15gb storage limit?

@aschonfeld - You were right! I missed loading the function. Is there a way to avoid loading the function?


Am wondering what could have happened earlier, that it didn't load. I check the !pip freeze and all libraries are exactly same as yesterday

aschonfeld commented 4 years ago

So the show_csv function I gave you was just so I could show you what code was being executed under the hood of D-Tale. If you use d = dtale.show_csv(path=url) it should do the same thing.

So the only thing I can think of is that something with google colab doesn't like when it tries to return the D-Tale instance directly which was why I told you to store it in a variable d and then pull the url for viewing using _main_url.

That being said, I was able to view it fine in my google colab notebook without storing my instance in a variable. So the only thing I can think of that is causing the issue is that you have spark installed in your notebook and I don't. I know spark does some special stuff to environments using java so there has to be something to that...

I don't think there is any intrinsic benefits to loading pandas other than the fact that D-Tale is built for pandas data structures (think of my earlier post where I showed the exception from trying to pass a koalas dataframe directly). I don't have a ton of knowledge about the memory management of spark so maybe you'd get some benefit there 🤔

jainayush007 commented 4 years ago

Thanks for your help! Closing the issue.

jainayush007 commented 4 years ago

seems like the d._main_url isn't functional anymore. I probably wont need it since I can see the link is generated but wanted to bring to your attention.

jainayush007 commented 4 years ago

And the dtale webpage doesnt open with the given link!

aschonfeld commented 4 years ago

Hmm, still seems to work for me

jainayush007 commented 4 years ago

I think if you run it twice, it changes to an incorrect URL.

Also, the _main_url seems to be non working.

