rlabbe / Kalman-and-Bayesian-Filters-in-Python

Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters, and more. All exercises include solutions.
Other
16.44k stars 4.16k forks source link

Hackernews followup. #17

Closed Carreau closed 9 years ago

Carreau commented 9 years ago

I wrote to you on hackernews (but know I'll forget to check answer, so I copy/past here)

Sorry about the notebook format change. Did you loose work? Conversion should be handled for you, if not it is a bug, we can fix it on 3.1. We would be happy to get more feedback on your writing process and your need, feel free to directly contact the team (IPython-dev at scipy.org, or issue on main IPython repo is fine). As for concept of "book" or collection of notebook, we are working on that (integration with sphinx)

If you loose things in migration it's not normal, we can fix that, please open an issues. We would also love to get more details feedback of your pain points in writing. We are having our bi-annual IPython dev meeting this week so it is the time where we will sketch the future for the next 6 to 12 month on IPython now.

I see you are in SF Bay Area, we are meeting in UC Berkeley, I guess you could even come and say Hi, if you prefer in person face-to-face feedback/complaint :-)

rlabbe commented 9 years ago

Hi, Thank you for your concern and for checking. No, I did not lose any work. I lost time :) Which is not a big deal, I understand the importance of the work being done on Jupyter and the need for the format change.

We've actually chatted a little bit on the github issue lists these past few weeks. I'm starting to be a bit vocal about Jupyter, not to be a complainer, but because some people are using it for longer form work and there are improvements to be made there, and how would the dev team know if someone in the thick of the it didn't speak up.

I work in Fremont, so Berkeley isn't so far away. I'd be happy to try to come by for the meetings if you wanted to chat; I probably couldn't afford a lot of time unless they were after work hours, but this also wouldn't require a ton of time.

But perhaps the big thing to keep in mind is that Jupyter has no concept of a 'book'. Meaning, you can't put things in chapters, you have to hack it to get things like prefaces (I don't want section numbers for a preface, I do for chapters, for example) and appendixes (they should be called 'Appendix A', not 'Chapter 17'). If you browse the directory in nbviewer or in your local browser there is no indication that this is a single work or that there is a preferred order (the book writers all do things like 01_name or Chapter01_name, but then what about prefaces and such, how does the reader know the access order). It would be great to have a bit of metadata describing the book that shows up in the directory listing, so it works more like a toc. I wrote a TOC with all that content as a notebook, but the user has to know to click and open that notebook, and it is alphabetically at the bottom, so I don't think it happens a lot. I can look at my github logs to get an idea of usage patterns.

And then there really isn't any good template to generate books. PDF is, in one way, the best output. No one seems to agree with that because you lose all the dynamic behavior, but here are two important things. First, when I open a PDF I get a tree on the left that allows me to navigate the book quickly. I can CTRL-F and search for a term. Just last night I was trying to remember something about an IPython widget that I did in a different notebook(chapter) in the book. I didn't recall which chapter, and I had no way to search other than to open the PDF. Found it and went on working. It would have been painful if everything was in separate files.

If you export to HTML you can either export to separate HTML documents with the same problem as the notebooks - not globally searchable, no page numbers, and section numbers (if any) not related to the chapter number (i.e. first heading in chapter 3 would be 1.0, not 3.0). And, no TOC. If you merge into one big document then I can use an extension to make a TOC, but overall the flow is just terrible; there is no visual cue that you are moving from one chapter to the next, there is not tree or browsing structure to go between chapters. I could write some more HTML to tie all the chapters together, but I suggest that if Jupyter did want to support books having a built in, uniform way to do this would be better than having each author invent what would be a substandard solution.

I wrote a script that merges all the notebooks into one large one before conversion to HTML or PDF. For the PDF I also have a .tplx file to specify output format (actually, I think you wrote it for me in one of my issues, perhaps on reddit). Here I get pretty close to the format I would like, but not quite. First, to get a preface my python script has to go in, search for the first \chapter in the tex file, and convert it to \chapter* to avoid numbering But then I had to add a raw NBConvert cell with \addcontentsline{toc}{chapter}{Preface} at the top of the preface so it gets added to the TOC. Okay, I guess I could make my script do that as well; the broader point is that you have to put raw latex in the notebooks in some places to get the pdf output that you want, latex that makes no sense to the online reader.

to get the styling you want (the default style is not so great for books, IMO), you have to add an input cell. I do

#format the book
import book_format
book_format.load_style()

which loads some css and does things like set the rcParams for matplot lib so everything renders as I'd want. Not exactly a big deal, but it is more stuff in the notebook that really doesn't pertain to the subject matter. My merge script uses the "#format the book" string to identify these cells and removes them from the notebooks so the pdf of the book do not include them.

When viewed online, with nbviewer, there is no concept of a book. I want people to use my TOC (which is another notebook) because it contains not just the chapter names, but a description of the topics covered. You don't necessarily have to read the book linearly - you can search for the topic of current interest). The default navigation just takes you back to the directory listing, so I added a link to my TOC on nbviewer at the top of every chapter. But, this link makes no sense if you have cloned the book to your local drive (the best way to get the interactive features). It'll take you from the local drive off to nbviewer. And, it is another thing that needs to be stripped out before conversion to PDF.

Probably I am doing something wrong, but I haven't figured out how to get equation numbering working, especially across chapters. I'd like to refer to equation 3.23 (23rd equation in chapter 3).

References are a pain. Again, maybe not doing something right. But I have to manually insert [1]s in the text, and then as I write of course the order changes, so the numbers are changing. I've seen extensions for this in calico for example; haven't played with them yet, but it didn't look like it would work (I cannot elaborate on that; I don't recall why I thought that when I looked at them, and I may be wrong).

So far as I know there is no way to get footnotes or endnotes. I'll admit I haven't looked closely.

I don't see a way around this at all, but layout in the pdf could be improved vis-a-vis plots. All plots immediately follow their code cells, and so you end up with weird page breaks with half a page of blank space. latex reflows the figures to minimize white space and orphans.

PDF output doesn't seem to recognize the > markdown to get a side bar.

PDF output doesn't seem to recognize indented text as quoted code.

I haven't looked into this, but the IPython widgets are a bit 'laggy' for me. I'm using them to alter parameters that influence how a filter works, so every widget movement means another filter run, which is slow. So requests stack up, and if I drag a scroll bar a fair distance quickly the filter will rerun several times, when really I just want it to rerun once, when the widget gets to the final location. Again, maybe something for this is in the widget's api. Recognize I am using your technology to write my book; I am not an IPython user in depth otherwise, so I am less likely to know all the ins and outs.

Most of these are minor annoyances, and people are successfully writing books with Jupyter. But there is a lot of ad-hoc work going on to work around limitation; or people don't try (so far as I know the Bayesian Statistics for Hackers book doesn't try to create a 'book' format, for example). We write custom css, and then figure out how to get it into the book. I need to write Python tools that read the native notebook format to remove stuff or add stuff to get a specific output format looking it's best.

My larger concern is longevity. Jupyter has funding and full time developers, and that is great. But this conversion to IPython 3.0 underscores the issue. Do I want to be supporting this book for the rest of my life? No! I want to get it done, and move on to the next project (I'm tenatively targeting linear algebra). I'm answering emails about exceptions being thrown (because they are running 2.2, and the exception thrown surely does not give you any indication that the notebook is a newer version; 2.4 gives you a nice downgrade message). What happens when Python has a breaking change in version 3.7, or when Jupyter becomes it's own project @ IPython 4.0? What happens in 10 years; will the book even run? Should I put all this work into a moving target? I am not sure. Should I write my own code to make an HTML style book, or is it in your pipeline of things to do? The static workflow of code+latex+png files while cumbersome and not interactive, seems a lot longer lived, and far more controllable as far as the output layout goes.

Anyway, those are my major thoughts. Obviously I love Jupyter, and I want to keep using it for my projects large and small, and probably will. The interactivity is too important, especially for trying to learn things like math and algorithms. And the ability to accept pull requests, or make my own fix, and have the changes live to the world via github and nbviewer is fantastic. But there are costs and difficulties, which I've tried to document above. I hope some of it helps.

Carreau commented 9 years ago

Le 10 mars 2015 à 09:30, Roger Labbe notifications@github.com a écrit :

Hi, Thank you for your concern and for checking, but no, I did not lose any work. I lost time :)

Ok, sorry about that it can stil be hard.

Which is not a big deal, I understand the importance of the work being done on Jupyter,

Thanks, you also did an awesome job.

We've actually chatted a little bit on the github issue lists these past few weeks.

Ok, We have some many front on which we are active at the same time that sometime we forget who we have seen/discussed for.

I'm starting to be a bit vocal about Jupyter, not to be a complainer, but because some people are using it for longer form work and there are improvements to be made there, and how would the dev team know if someone in the thick of the it didn't speak up.

I work in Fremont, so Berkeley isn't so far away. I'd be happy to try to come by for the meetings if you wanted to chat; I probably couldn't afford a lot of time unless they were after work hours, but this also wouldn't require a ton of time.

So let's plan something for one of the next weeks, it will be easier to have diner or something.

But perhaps the big thing to keep in mind is that Jupyter has no concept of a 'book'. Meaning, you can't put things in chapters, you have to hack it to get things like prefaces (I don't want section numbers for a preface, I do for chapters, for example) and appendixes (they should be called 'Appendix A', not 'Chapter 17').

Yeah, I understand that, I tried to hack something together for Bayesian Methods for Hacker, to make one PDF for everything it was indeed annoying. We have other people interesting in using the Jupyter architecture using a slightly different editing tool; and we will be working on the next year(s) on having a better integration with publication in the wide term.

If you browse the directory in nbviewer or in your local browser there is no indication that this is a single work or that there is a preferred order (the book writers all do things like 01_name or Chapter01_name, but then what about prefaces and such, what order should be used). It would be great to have a bit of metadata describing the book that shows up in the directory listing, so it works more like a toc. I wrote a TOC with all that, but the user has to know to click and open that book, and it is alphabetically at the bottom, so I don't think it happens a lot. I can look at my github logs to get an idea of usage patterns.

I guess that this might be too much for nbivewer, and that this might be tackled by something like gh-pages. Where nbconvert could use custom filter to actually get cross linking TOC, etc. I suppose we have to make a prototype and see what is needed.

Your experience will be useful in that;.

And then there really isn't any good template to generate books. PDF is, in one way, the best output. No one seems to agree with that because you lose all the dynamic behavior, but here are two important things. First, when I open a PDF I get a tree on the left that allows me to navigate the book quickly. I can CTRL-F and search for a term. Just last night I was trying to remember something about an IPython widget that I did in a different notebook(chapter) in the book. I didn't recall which chapter, and I had no way to search other than to open the PDF. Found it and went on working. It would have been painful if everything was in separate files.If you export to HTML you can either export to separate HTML documents with the same problem as the notebooks - not globally searchable, no page numbers, and section numbers (if any) not related to the chapter number (i.e. first heading in chapter 3 would be 1.0, not 3.0). And, no TOC. If you merge into one big document then I can use an extension to make a TOC, but overall the flow is just terrible; there is no visual cue that you are moving from one chapter to the next, there is not tree or browsing structure to go between chapters. I could write some more HTML to tie all the chapters together, but I suggest that if Jupyter did want to support books having a built in, uniform way to do this would be better than having each author invent what would be a substandard solution.

Completely agree, we are thinking of writing integration with sphinx. Personally I liked sphinx as it can build an index for html pages purely client side, that allows you to do search on static site.

The pdf output of sphinx is, maybe not that nice, but is pretty good. The pdf can have cross references, and you can output Pdf the have Page number for printing version.

Thou all the sphinx integration need to be written.

I wrote a script that merges all the notebooks into one large one before conversion to HTML or PDF. For the PDF I also have a .tplx file to specify output format (actually, I think you wrote it for me in one of my issues, perhaps on reddit).

Than may not have been me, or a long time ago though. But sure I understand the need for custom script.

Here I get pretty close to the format I would like, but not quite. First, to get a preface my python script has to go in, search for the first \chapter in the tex file, and convert it to \chapter* to avoid numbering But then I had to add a raw NBConvert cell with \addcontentsline{toc}{chapter}{Preface} at the top of the preface so it gets added to the TOC. Okay, I guess I could make my script do that as well; the broader point is that you have to put raw latex in the notebooks in some places to get the pdf output that you want, latex that makes no sense to the online reader.

to get the styling you want (the default style is not so great for books, IMO), you have to add an input cell. I do

format the book

import book_format book_format.load_style() which loads some css and does things like set the rcParams for matplot lib so everything renders as I'd want. Not exactly a big deal, but it is more stuff in the notebook that really doesn't pertain to the subject matter. My merge script uses the "#format the book" string to identify these cells and removes them from the notebooks so the pdf of the book do not include them.

Yes, people a re requesting theme, we dropped that for the reason that nobody was taking the keeping themes up-to-date. but there is no reason we couldn't get a key in notebook metadata that would just link to a theme. We actually will be hiring a designer to do better themes once we have funding and found the right designer, but we are not there yet.

When viewed online, with nbviewer, there is no concept of a book. I want people to use my TOC (which is another notebook) because it contains not just the chapter names, but a description of the topics covered. You don't necessarily have to read the book linearly - you can search for the topic of current interest). The default navigation just takes you back to the directory listing, so I added a link to my TOC on nbviewer at the top of every chapter. But, this link makes no sense if you have cloned the book to your local drive (the best way to get the interactive features). It'll take you from the local drive off to nbviewer. And, it is another thing that needs to be stripped out before conversion to PDF.

I think this could be solved in a first time with sphinx/ gh-pages, and if there is a huge demand, we can move key pieces to nbviewer.

Probably I am doing something wrong, but I haven't figured out how to get equation numbering working, especially across chapters. I'd like to refer to equation 3.23 (23rd equation in chapter 3). References are a pain. Again, maybe not doing something right. But I have to manually insert [1]s in the text, and then as I write of course the order changes, so the numbers are changing. I've seen extensions for this in calico for example; haven't played with them yet, but it didn't look like it would work (I cannot elaborate on that; I don't recall why I thought that when I looked at them, and I may be wrong).

Hum, with mathjax in html, I have no clue. This is on elf the point (bette markdown with extension cross ref end citation) where we want to have a grant, and get some funding.

So you are not the only one to suffer from that.

So far as I know there is no way to get footnotes or endnotes. I'll admit I haven't looked closely.

I don't see a way around this at all, but layout in the pdf could be improved vis-a-vis plots. All plots immediately follow their code cells, and so you end up with weird page breaks with half a page of blank space. latex reflows the figures to minimize white space and orphans.

PDF output doesn't seem to recognize the > markdown to get a side bar.

Hum, try changing your pandoc version, it might be a pandoc bug..

PDF output doesn't seem to recognize indented text as quoted code.

Same.

But feel free to open bug report on IPython repo.

I haven't looked into this, but the IPython widgets are a bit 'laggy' for me. I'm using them to alter parameters that influence how a filter works, so every widget movement means another filter run, which is slow. So requests stack up, and if I drag a scroll bar a fair distance quickly the filter will rerun several times, when really I just want it to rerun once, when the widget gets to the final location.

there is a way to tweak the throttling. there is interact_manual in 3.x that add a button that trigger the redraw only on button click.

Again, maybe something for this is in the widget's api. Recognize I am using your technology to write my book; I am not an IPython user in depth otherwise, so I am less likely to know all the ins and outs.

No problem, we are nearby, and we are always available for questions, especially in person if you are nearby.

Most of these are minor annoyances, and people are successfully writing books with Jupyter. But there is a lot of ad-hoc work going on to work around limitation; or people don't try (so far as I know the Bayesian Statistics for Hackers book doesn't try to create a 'book' format, for example). We write custom css, and then figure out how to get it into the book. I need to write Python tools that read the native notebook format to remove stuff or add stuff to get a specific output format looking it's best.

And sure, we would be happy to get more manpower to work on that.

My larger concern is longevity. Jupyter has funding and full time developers,

And we have concern too. Technically we are mostly 4 people working on it, and we are dangerously reaching the point where we don't have publish enough in the last years to continue like that. Hopefully our funding should get renewed this year.

and that is great. But this conversion to IPython 3.0 underscores the issue. Do I want to be supporting this book for the rest of my life? No! I want to get it done, and move on to the next project (I'm tenatively targeting linear algebra).

And we don't want to support old version of IPython for the same reason.

I'm answering emails about exceptions being thrown (because they are running 2.2, and the exception thrown surely does not give you any indication that the notebook is a newer version; 2.4 gives you a nice downgrade message). What happens when Python has a breaking change in version 3.7, or when Jupyter becomes it's own project @ IPython 4.0? What happens in 10 years; will the book even run? Should I put all this work into a moving target? I am not sure. Should I write my own code to make an HTML style book, or is it in your pipeline of things to do? The static workflow of code+latex+png files while cumbersome and not interactive, seems a lot longer lived, and far mo re controllable as far as the output layout goes.

I would suggest having a look at hashdist, which is made to try to "solve" some of these problem. But these are real concern we also have, and where we want some of the Python community to solve some of these.

Anyway, those are my major thoughts. Obviously I love Jupyter, and I want to keep using it for my projects large and small, and probably will.

Thanks for the detail mail ! And, BTW, you got 20k view in one month, mostly US, roughly doubled nbviewer traffic.

The interactivity is too important, especially for trying to learn things like math and algorithms.

We are trying to bring interactive widgets on nbviewer. Hope that will help.

And the ability to accept pull requests, or make my own fix, and have the changes live to the world via github and nbviewer is fantastic. But there are costs and difficulties, which I've tried to document above. I hope some of it helps.

Yes, thanks very much !

Let's try to organize a discussion later.

M

rlabbe commented 9 years ago

I don't have time for a detailed follow up right now; but yes, let's schedule a dinner or lunch in the next few weeks. I know O'Reilly is doing work to incorporate Jupyter into their publishing model; maybe there is some synergy there that you could exploit.