Open datapythonista opened 5 years ago
we automatically inject code for imports
+1 for removing the {{ header }}
stuff in favor of explicit imports.
And also avoid aliasing our own module
I disagree here. Let's keep that as a separate issue.
Also +1 on explicitly doing and showing the imports.
We could also see if they can be simplified. Eg randn = np.random.randn
could be fixed for all occurences to use the full version. And maybe pd.options.display.max_rows = 15
can be removed with the shorter default nowadays (although if the examples use a length of eg 50, that would still be shown).
And also agree with keeping the alias discussion for a separate issue. That will be easier for this one ;)
Sorry for mixing things. I think there is agreement on this one, for the user guide. I was also considering the API, but I'll also open a separate issue for that, so we keep the discussions focused. Thanks for the feedback.
why the move to remove header macro? IIRC this was put in place to make sure that all of the imports are exactly the same everywhere on each doc section
this seems like churn for no gain
The code in the header is hidden, and we implemented the union of all the blocks with imports, seeds... to avoid duplication, and because having a huge header file in every file was transparent to the user.
The rendered header added a problem that is that Sphinx doesn't report the right line for the errors anymore (it reports the line on the rendered file, but we look for the error in the template). It was already under discussion whether we wanted to keep it or not.
But the reason for proposing this now, is that the header adds extra complexity to make the code runnable in Binder. In two different ways:
The header surely provides value in avoiding repetition, but IMO it's not worth. It's better to repeat the needed parts, and also be explicit about all what we're doing, since with Binder we'd give the users the change to play with the code, and any magic there will be confusing and unappreciated.
we’ll i still would have a fixed header of imports; having consistency in the docs overrides all other issues ; it easily becomes a mess without enforcement via CI
Now we validate in the CI that Sphinx doesn't generate warnings, so if an import is missing the CI should fail.
having consistency in the docs overrides all other issues
Note that this is only for the doc build, as it is about a hidden code block right now. And then @datapythonista is correct that this is now validated in CI that the doc build runs correctly.
Part of the proposal is also to make it no longer a hidden code block. So there we also need to ensure that what we show to the users is somewhat consistent throughout the user guide. But I think we can certainly achieve that without inserting a common header as well.
having consistency in the docs overrides all other issues
Note that this is only for the doc build, as it is about a hidden code block right now. And then @datapythonista is correct that this is now validated in CI that the doc build runs correctly.
Part of the proposal is also to make it no longer a hidden code block. So there we also need to ensure that what we show to the users is somewhat consistent throughout the user guide. But I think we can certainly achieve that without inserting a common header as well.
well this was just changed to make it consistent for each file the concern here is the same as the original
it will get inconsistent thru time across files
so if possible to have a single file with consistent imports that is then objected to the build would be preferable
Your last sentence is not very clear to me.
Your last sentence is not very clear to me.
right was a suggestion if we can have still a single place to import and set options for docs (and still have binder work); as this enforces consistency the best imho
Well, that's more or less the whole point of the issue that Marc opened: not having it in the actual rst file (but injecting it as we do now) is probably possible to have it work with binder, but it is added complexity (need to ensure the header is injected in the binder pipeline as well, as that will probably be independent from sphinx), and hence his preference to just get rid of the injected header (as we did until very recently)
This is what we render automatically with {{ header }}
:
.. currentmodule:: pandas
.. ipython:: python
:suppress:
import numpy as np
import pandas as pd
randn = np.random.randn
np.random.seed(123456)
np.set_printoptions(precision=4, suppress=True)
pd.options.display.max_rows = 15
import os
os.chdir(r'{}')
randn = np.random.randn
is a bad idea, and just used in the release notes of 0.10.0.
I guess the rest is used in almost every page of the user guide.
I think there are two different questions:
I can see the advantages and disadvantages in both options of 1. But for 2, I think it's vert wrong to hide code, create magic to the users, not let them reproduce exactly our examples, and Binder will make that even much worse.
agree that assigning to randn is just incorrect - that should just be fixed
ideally we would expose this as user code in the docs but how do you propose to keep it consistent? that was an extreme problem before
ideally code review can do this but it has over time dropped the ball it is VERY hard to keep consistently in this large a code base (just look at some of the duplication in tests for example);
anything style related needs to be automatic and in the CI
what if you inline the code and add a style check for this?
Personally, I think you overestimate the "extreme problem" a bit.
Before, the header was much longer because we imported lots of things with non-standard imports, this initial hidden ipython block was a mess. But that was cleaned up in the docs over time, and now (apart from the single randn thing), all example code uses standard imports. The initial code block with imports is thus much smaller.
I personally don't see a problem with repeating this in every file.
(but in any case, it seems there is agreement over Marc's point 2) -> let's show this code block with imports to the users)
For now I'm happy to just remove the :suppress:
and keep the header. Hiding code from the user is what I think it's a problem, not that much the header itself (sorry I mixed both in the issues)
For the header I'm more open, but I don't think it's that important to be consistent (e.g. having different seeds or different display.max_rows). May be we could have variables for them (e.g. np.random.seed({{ random_seed }})
), we would still repeat some code, but we would get correct line numbers for Sphinx errors.
For being inconsistent with the imports, if an import is missing the CI will fail because code blocks will fail and we are now failing the CI for that.
i am fine with changing this but i have seen over time that it is very easy to get inconsistency from well intentioned changes.
This PR is well done but I'm not sure I am on board with the concept. Associating each import with its first reference in the rst feels a little arbitrary and I think is just confusing. Seems like it could be somewhat of a nuisance when moving things around; nothing major but in any case a step backwards from just knowing standard imports are done in the header
Are we dead set on going this route?
Sorry last comment was in reference to PR #28089 but posted here in issue
Dear @WillAyd , since the matter was discussed on #28089 , can we close this issue ?
I think so but @datapythonista I think can close or clarify
Thanks @lukelima for the triagging. #28089 removed the {{ header }}
from one of the pages, but most of them still have it. So, this issue is still relevant, and can't be closed.
I've been discussing with the Binder team, and looks like it'll be possible soon to make the pandas examples in the docs runnable directly from the docs.
I think this will be amazing, but I think it's one more reason to start being explicit and self-contained with the docs. One option is of course that the trickiness we have now on importing things translate to Binder, and we automatically inject code for imports (and may be seeds...).
But my opinion is that the right thing is to start being explicit on what we run in the examples.