jupyter notebook - Githubissues

jupyter / notebook

Jupyter Interactive Notebook

https://jupyter-notebook.readthedocs.io/

BSD 3-Clause "New" or "Revised" License

11.79k stars 5k forks source link

jupyter notebook #5534

Open swarnna opened 4 years ago

swarnna commented 4 years ago

is it okay for jupyter notebook to take about 1 minute 20 seconds to read a 78 mb file. I have to work simultaneously on 25 such files in the same project. Is there any way to speed up the kernel because at this rate any operation will take ages to compute.

kevin-bates commented 4 years ago

Reading an MB/sec is not okay. However, it's tough to determine what might be going on here.

Is the 78 MB file the ipynb file itself and you're finding that it takes that long to load?
Or (more likely), are you trying to read a 78 MB file from within a cell of your notebook?
Where is the file located relative to your Notebook server?
What is the code doing that is loading the file?
What kinds of operations are occurring on that server at this time (might the server be overloaded)?
What kind of server is it (platform, OS, version)?

Could you please provide answers to these questions and perhaps it will trigger someone's experiences and they can provide some insight? Thanks.

swarnna commented 4 years ago

the 78mb file is a dataset with 756600 rows and about 11 columns. It's located in my drive. the code that I used was data=pd.read_excel('file.xlsx',skiprows=22,error_false_lines=False).dropna()

I am using windows and no there were no other operation going on.

kevin-bates commented 4 years ago

Thank you. And running this same code directly (as opposed to within a notebook) is much faster?

swarnna commented 4 years ago

Really sorry for the late reply, I have no internet connection since last 7 days. yeah it is. I figured out the issue the easiest way was to use vaex library. I was able to read a 100GB file in 5 seconds and another way i found was to change the file from excel to comma delimited.

kevin-bates commented 4 years ago

I figured out the issue the easiest way was to use vaex library. I was able to read a 100GB file in 5 seconds and another way i found was to change the file from excel to comma delimited.

So it sounds like changing the code to read the csv rather than excel is a solution and the slow operation is within the read_excel() method itself? If that's the case, or you no longer believe this is directly tied to notebook, we should close this issue.