Resetting Max Buffer Size for Jupyter Notebook

JakeLehle commented 3 years ago

So I've been playing around with the pipeline running it from the docker container so the packages are all the right versions and using the jupyter notebook. I get the 75% mark on the pipeline just after you start using the monocle and the jupyter notebook kernel kills itself. At first, I thought this was an issue with my computer so I switched over to using one with 32GB of RAM and it crashed at the same exact place regardless of if I'm using a computer with 16 or 32 GB of RAM.

Do you need to modify and increase the max buffer size of ~/.jupyter/jupyter_notebook_config.py file to have it run all the way through?

Can anyone who has run all the way through the pipeline in the jupyter lab notebook tell me what this line of their jupyter_notebook_config.py file says?

NotebookApp.max_buffer_size = your desired value

Thank you so much!

LuckyMD commented 3 years ago

Hi @JakeLehle! So I've run the pipeline mostly on a server with 377GB of RAM, but a long time ago also locally in a conda environment. I don't have a jupyter_notebook_config.py file on the server or locally (16GB RAM). I only have emacs keybindings stored in in the .jupyter/ folder.

Can you run the notebook without monocle? This is quite a heavy package both to run and to be loaded.

Are you running this on the example data or on your own?

JakeLehle commented 3 years ago

Hi @LuckyMD

Yeah, my university has a cluster but they won't let us have very much environmental control and no sudo privileges so I gave up trying to use their system and set up my own so I could have full root privileges and use the docker container. (Which thank you for suggesting I look into it in an earlier comment. I can't believe I wasn't using docker more in the past to avoid snowflake system issues.)

So to modify and expand the buffer in jupyter lab I used $ jupyter notebook --generate-config to make the config file I mentioned in my previous comment which has all the default values commented out and then modified it with vi to expand that c.NotebookApp.max_buffer_size = your desired value.

But sounds like this is the wrong way to go about this then. I'm using the sample data right now but I was hoping to get the whole pipeline working and then pop in some data from a single cell paper I was a part of in the past and compare the pipeline with our previous results.

I'll try to finish out the pipeline and skip the monocle section. If that all works then I'll just break up the pipeline and save just the objects I will need to run that monocle section and run it separately as a workaround.

Thanks for getting back to me so quickly! You can close this issue and I'll add an additional comment if skipping monocle works if anyone in the future is trying to run this on their own system and runs into the same issue.

LuckyMD commented 3 years ago

I have been using the environment.yaml file with some manual R installation to get the environment set up on our server (we don't get sudo privileges either). You may be able to run via singularity or charliecloud as well though, which doesn't require sudo as far as I'm aware.

On the monocle note, this was recommended for complex branching trajectories, which the sample data is not. So it's probably okay to skip. I typically use PAGA, DPT or slingshot which all perform fairly well for me anyway.

Good luck!

JakeLehle commented 3 years ago

@LuckyMD

I'm tempted to go back and work on our universities server but setting up my own personal computer to run this kind of stuff has been really fun and has taught me some very cool system administration tricks so I think for the foreseeable future I'm gonna try and run it on my own if I can manage.

Pulling the monocle section out of the code worked for me and I was able to run the remainder of the pipeline! Just a note for anyone reading this in the future I was able to run this on my 32GB system and near the end, it was using 92% of my available RAM so if you want to be stubborn and run this outside of a server set up you should consider running it on a setup with either 32 or 64 GB.

I'll save just the variables needed to run monocle and break the code up as two pipelines to show other people in my lab how to do complex branching if they ever need to do it in the future. Oh cool, I'm not familiar with PAGA or the DPT but I just pulled up a couple of papers on them and I'll start reading up now and see if it will come in handy for me as well. Thanks for the tip and for putting this pipeline together. It's been a really big help to me and a great learning tool.

LuckyMD commented 3 years ago

Glad things worked out for you in the end!

theislab / single-cell-tutorial

Resetting Max Buffer Size for Jupyter Notebook #72