swcarpentry / DEPRECATED-bc

DEPRECATED: This repository is now frozen - please see individual lesson repositories.
Other
299 stars 383 forks source link

[meta] Python3 vs Python2 #107

Closed rgaiacs closed 10 years ago

rgaiacs commented 10 years ago

DISCLAIMER: I don't want to start a flame war.

Related with: #71 and #105.

I know that lots of Python packages are incompatible with Python3 and maybe never will be but lots of them already are (e.g. Numpy, Scipy, matplotlib, IPython, ...).

What about the Python lessons start using only Python3 print syntax?

Short version of why using Python3 print syntax

$ python2
Python 2.7.5 (default, Sep  6 2013, 09:55:21) 
[GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print 'foo'
foo
>>> print('foo')
foo
>>> quit()
$ python3
Python 3.3.2 (default, Sep  6 2013, 09:30:10) 
[GCC 4.8.1 20130725 (prerelease)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print 'foo'
  File "<stdin>", line 1
    print 'foo'
              ^
SyntaxError: invalid syntax
>>> print('foo')
foo
>>> quit()

Long version of why using Python3 print syntax

Since Software Carpentry are teaching scientists how to better programming I think that keep using features that will be deprecated in "short" time instead of another that are back compatible isn't the way to show how to improve your programming skills.

ahmadia commented 10 years ago

Thanks for raising this.

Python 2.7 has been around since 2010, so the print function has been available since then. Earlier versions (Python 2.6), require a from __future__ import print_function statement for that to work.

Since we generally advise students to install Python meta-distributions like Anaconda and Canopy, which use Python 2.7, I don't see any disadvantage with making this change across our material.

ethanwhite commented 10 years ago

A big +1 for working on making all of our material Python 3 compatible. The vast majority of the core scientific stack now works on Python 3 and Anaconda is already starting to make Python 3 available [1]. Making our stuff Python 3 compatible not only future proofs us but may help push the community forward to 3 as well.

[1] http://www.walkingrandomly.com/?p=5089

wking commented 10 years ago

On Thu, Oct 24, 2013 at 06:39:40AM -0700, Ethan White wrote:

Making our stuff Python 3 compatible not only future proofs us but may help push the community forward to 3 as well.

+1. It also reduces the risk of confusion if our newly trained students are dropped into a Python 3 community. Folks who print() have one less thing to worry about.

gvwilson commented 10 years ago

+1 --- let's wait until the current round of instructor trainees' pull requests have been taken care of, then do this at the same time as inlining stuff currently in _includes and rationalizing image locations. Raniere, can you please open a ticket for it?

ahmadia commented 10 years ago

@gvwilson - aren't we currently in the ticket (issue?) for it? I'm okay with waiting to address this until the active PRs have landed (I don't think we should wait on PRs that are stagnant, I can always merge/rebase fixes in as we go).

Let's say we revisit this in a week?

ahmadia commented 10 years ago

@r-gaia-cs - I'm switching this from a [meta] to an active issue since everybody is on-board with a move to Python 3-compatible print syntax everywhere and assuming that Python 2.7 is available.

Since there are tons of lines to convert, you will probably need to create a script to do this (especially because we may need to rebase it as other commits land). Let me know if you have any questions about this.

rgaiacs commented 10 years ago

@ahmadia 2to3 does a good job with .py but won't work with IPython notebook:

$ grep print python-01* | head -n 5
      "print 'function produced:', result"
      "which is then printed.\n",
      "print 'function produced:', zero()"
      "print 'water freezes at', fahr_to_kelvin(32)\n",
      "print 'water boils at', fahr_to_kelvin(212)"
$ 2to3 python-01-functions.ipynb 
RefactoringTool: Skipping implicit fixer: buffer
RefactoringTool: Skipping implicit fixer: idioms
RefactoringTool: Skipping implicit fixer: set_literal
RefactoringTool: Skipping implicit fixer: ws_comma
RefactoringTool: No changes to python-01-functions.ipynb
RefactoringTool: Files that need to be modified:
RefactoringTool: python-01-functions.ipynb
gvwilson commented 10 years ago

Please ask on the ipython developers' list how they've handled their 2->3 transition --- I really hope they have something better than 'sed' :-)

ahmadia commented 10 years ago

@r-gaia-cs - Yes, you could try asking @takluyver on either the dev list or one of their other help forums. Let me know if you'd like me to make the introduction.

takluyver commented 10 years ago

Hi there - mentioning me sends me an e-mail notification ;-). We don't have an automated conversion tool for notebooks ourselves - we don't curate enough notebooks ourselves to really need one. However, it should be feasible to write something that loads a notebook, runs 2to3 on each of the code cells, and saves it again. I'm happy to help build that. It's not entirely trivial, though:

Also, I should just point out something I think was missed in the discussion above - print('foo') works in Python 2.7 because of a syntax quirk (any expression can be in brackets), not because there is a real print function available. So:

print('foo')  # Works
print('foo', 'bar')  # Runs, but prints a tuple
print('foo', file=sys.stderr)  # SyntaxError

If you use from __future__ import print_function, all of those cases behave as expected.

rgaiacs commented 10 years ago

@takluyver Thanks for the words and the advice about sys.stderr.

@ahmadia Do you think that the sys.stderr will be a big issue for instructors? (I'm not.)

takluyver commented 10 years ago

It's not specific to sys.stderr - you need the __future__ import to be able to pass any keyword arguments (file, end, sep) to print().

asmeurer commented 10 years ago

To further clear up any confusion, Python 2.6 and Python 2.7 are completely identical concerning print. The real differences there are things like set literals and dict/set comprehensions, which you have to avoid completely if your code will ever touch 2.6.

ahmadia commented 10 years ago

@takluyver and @asmeurer - Thanks for weighing in, gentlemen :)

@r-gaia-cs @ethanwhite @wking @gvwilson - If I understand @asmeurer correctly, we're getting lucky with the print statements working as expected.. Given the comments from @takluyver and @asmeurer, I think we should prefer the Python 2 print syntax until we're actually teaching in Python 3 (which we should consider switching to next year, but now now). Any disagreement?

ethanwhite commented 10 years ago

It seems like it's only "getting lucky" in the sense that using this syntax in Python 2 doesn't support everything that it currently supports in Python 3. We still get fully Python 2 functionality (i.e., the functionality without the future import), so what are we losing by making our material work for both?

ahmadia commented 10 years ago

@ethanwhite - we're risking a student later expanding to Python 3 print function syntax , then getting Python 2 behavior.

I was under the mistaken impression that print can be used as a function in Python 2.7, and I don't think we should propagate this in our teaching. I've run into a number of other corner cases where it is highly confusing that print isn't a function.

We should just say that it isn't, and either use proper syntax and from __future__ import print_function, not exploit a muddled area of the language.

If you're suggesting we use from __future__ when teaching print, I'm happy to discuss that idea, which I'm neutral-positive on :)

ethanwhite commented 10 years ago

we're risking a student later expanding to Python 3 print function syntax , then getting Python 2 behavior.

I less concerned about this than you are and really like the idea of code that works in 2 & 3, but I don't feel strongly about it so I'm happy to stick with print statements if that's what others think is best.

I'm -1 on using future imports. I do this in my university courses to deal with integer division and it causes no end of confusion including people forgetting to do the import and getting very confusing behavior and students having a hard time understanding what that code is actually doing. Needing to explain modules and imports just to print something doesn't seem worth it to me.

gvwilson commented 10 years ago

On 2013-11-03 8:47 PM, Ethan White wrote:

I'm -1 on using future imports. I'm also -1 on future imports, having had the same experiences at Ethan.

asmeurer commented 10 years ago

To further clarify, print(x) works because it is parsed as print (x), which is the same as print x, since redundant parentheses around expressions are ignored in Python. You could also write print((x)) or print(((x))). This is more a consequence of the way the Python syntax tokenizer works than anything.

asmeurer commented 10 years ago

Maybe you should mention that that syntax has changed in Python 3, though. You want to avoid information overload, but it can be surprising for someone if they somehow end up in a Python 3 environment and they start getting syntax errors.

ahmadia commented 10 years ago

@asmeurer - do you have a recommendation on how you would use print, given the discussion you've seen so far, and your understanding of the subtleties here? I'm +0.5 for using it as a statement, Ethan is +0.5 for using it as a pseudo-function, I'm happy for you to be the tie-breaker here :)

wking commented 10 years ago

On Thu, Oct 24, 2013 at 05:45:43AM -0700, r-gaia-cs wrote:

I know that lots of Python packages are incompatible with Python3 … but lots of them already are (e.g. Numpy, Scipy, matplotlib, IPython, ...).

I'm ambivilant about print(…) vs print … in Python 2, but I'd be in favor of just dropping upgrading SWC lectures to use Python 3 ;). Maybe we should wait for Fedora 22 1 and Ubuntu 14.04 LTS 2, but the time to make this transition completely is approaching. I am not aware of any SWC dependencies that are not Python-3 compatible, but I haven't spent much time with some peripheral packages that are occasionally tought (Pandas, sympy, mayavi.mlab, … 3).

takluyver commented 10 years ago

(Pandas, sympy, mayavi.mlab, ...

Pandas and sympy are compatible, mayavi is not yet.

ahmadia commented 10 years ago

Discussions about moving our lesson material to Python 3 are also tabled until 2014 :)

wking commented 10 years ago

On Sun, Nov 03, 2013 at 06:23:17PM -0800, Aron Ahmadia wrote:

Discussions about moving our lesson material to Python 3 are also tabled until 2014 :)

I don't see the need to worry about Python 3 print-statement compatibility until we can talk about global Python 3 compatibility. There's no harm in letting the current state ride for another two months.

asmeurer commented 10 years ago

It's a tough call. If you are using Python 3 as your main Python (as I do), it's annoying when you have to go in and fix someone's code to use print as a function everywhere. There are occasionally other things as well, like string processing issues or reusing a map, but this is by far the most common. On the other hand, as noted, especially for introductory lessons, this could be confusing (it even confused you guys). I would agree with @wking's sentiment for the most part here. Going full Python 3 would be awesome, but there are still a few stragglers in the package space that are important to some people.

rgaiacs commented 10 years ago

I agree with the problem of import __future__ but I'm +1 in using the print statement as print('some string') instead of print 'some string' because:

  1. If we teach using print 'some string' students can have some problems some months later.
  2. Advanced String Formatting PEP 3101 has been backported to Python 2.6. (I'm +1 in teaching this.)
  3. The print statement syntax is "print" ([expression ("," expression)* [","]] ">>" expression [("," expression)+ [","]]) and I think that we can avoid teaching the use ofstderr` and the comma at the end of line to not insert a new line.
gvwilson commented 10 years ago

I'm still -1:

print(1, 2, 3) (1, 2, 3)

isn't something we want to have to explain, or explain away.

ahmadia commented 10 years ago

@gvwilson - I'm in agreement with you, and I don't think it's worth bike-shedding this. Here's the summary:

@r-gaia-cs - This doesn't mean you can't teach print as a function in your own bootcamps, but the standard for our own material will be to use it as a statement. Thanks for bringing this up, it was an important discussion point to resolve.

wking commented 10 years ago

On Sun, Nov 03, 2013 at 07:09:54PM -0800, W. Trevor King wrote:

On Sun, Nov 03, 2013 at 06:23:17PM -0800, Aron Ahmadia wrote:

Discussions about moving our lesson material to Python 3 are also tabled until 2014 :)

I don't see the need to worry about Python 3 print-statement compatibility until we can talk about global Python 3 compatibility. There's no harm in letting the current state ride for another two months.

Kicked off by comments on swcarpentry/site#298 1, is it now time to revisit Python 3 compatibility? Python 2.x is not getting any younger 2… Excepting mayavi (enthought/mayavi#84), we should be good to go for packages that have been used in boot camps.

If we want to kick the can down the road again, I'm fine with that too. I'd just like a new deadline since we've already hit “2014” ;).

ahmadia commented 10 years ago

There are some valid chicken-and-egg questions about whether Software Carpentry should be leading by teaching Python 3 in our materials or waiting until the community has more momentum.

I don't think this is one of our high priority tasks, but if an instructor is willing to take responsibilities for the following:

Then I am happy to continue this conversation :)

If nobody puts their hands up, we can revisit this in June.

jkitzes commented 10 years ago

Judging by the above, I'm in the minority here, but I would say it's worth holding off on this effort for two reasons.

  1. Just on a priority basis, it seems we have far bigger fish to fry right now with completing lessons in the first place, among other tasks.
  2. I still have the sense that the vast majority of the "help" that our students are likely to encounter after bootcamps (textbooks, Google searches, Stack Overflow, helpful colleagues) is going to instruct them in Python 2. While it would be nice to lead/push on this one, as noted above, it seems to me that the extra barriers this would throw up for students continuing to grow beyond the bootcamps could outweigh the (seemingly quite minor) benefit of switching right now.

All that, of course, subject to revision in the future. We'll certainly have to do this at some point, I'm just not convinced that time is now.

gvwilson commented 10 years ago

+1 to Justin's points: we have bigger fish, and most help out there is Python 2.*. Let's revisit after PyCon in April, when we have a better idea of what Enthought, Continuum, and other big players are planning.

asmeurer commented 10 years ago

@gvwilson that hits on the chicken/egg thing again, though. Continuum and Enthought and so on are likely not going to be the ones to take the charge here, because their priority is to respond to the marketplace. Continuum's Anaconda does support Python 3 (http://continuum.io/blog/anaconda-python-3), but it will not change to Python 3 default until after everyone is asking for it.

I really see things like swcarpentry as good places to start the cycle with Python 3.

jkitzes commented 10 years ago

@asmeurer, I would agree that we (the instructors) are the right ones to lead this push in our own work, but I think that (given the profile and needs of our bootcamp attendees) it would not be a benefit to our students to be early adopters. Those who are users/consumers and want this "just to work" should probably make the switch last - I would even suspect that many instructors (myself included) still haven't switched to Python 3 yet.

gvwilson commented 10 years ago

I agree with @jkitzes: we shouldn't try to roll all the rocks up all the hills at once, and getting scientists to adopt good development practices is a big enough challenge for a group our size...

ahmadia commented 10 years ago

Right, let's table this conversation for another look at developer adoption mid-year. If somebody wants to champion a conversion of our materials to Python3, please get in touch with me or Greg directly.

takluyver commented 10 years ago

By the way, @fperez and I have spoken a bit about having a Python 3 BoF at Scipy this year. I hope some of you will be there to discuss these things.

takluyver commented 10 years ago

For the record, Continuum do now provide Anaconda installers based on Python 3, so you can install the Scipy stack directly with Python 3, rather than having to set up a separate environment and install into that. Blog post. One step closer.

The Python 3 birds-of-a-feather session at SciPy next month is going ahead, and Nick Coghlan from core Python development is flying in to talk to us as well.

wking commented 10 years ago

And it looks like the Fedora bugs are still open [1,2]. Ubuntu made some progress, but didn't quite drop 2.x in 14.04. They expect to drop it from the base touch images by 14.10 3:

At the time of this writing (2014-05-08), Ubuntu 14.04 LTS has recently been released. We made great progress toward these goals, but we must acknowledge that it is a daunting, multi-cycle process. A top goal for 14.04 was to remove Python 2 from the touch images, and sadly we almost but didn't quite make it. There were still a few autopilot tests for which the Python 3 ports did not land in time, thus keeping Python 2 autopilot support on the base touch image. This work is being completed for Utopic and we expect to remove Python 2 from the touch images early in the 14.10 cycle (actually, any day now).

khinsen commented 10 years ago

The basic premise in this discussion is that scientific Python users will sooner or later switch to Python 3, the question only being when. Are we certain of that? From my own anecdotical experience, I am tempted to say no. People around me who actually use Python to do science stick to 2.x. Some haven't even looked at Python 3, other have but find no compelling advantages, and yet others considered switching but depend on some highly domain-specific library that hasn't been ported and perhaps never will be.

On the other hand, all the big players in the infrastructure (NumPy, SciPy, etc., but also distributions like Anaconda) are pushing for a move to Python 3 and are investing some effort as well. I suspect this is mostly because they believe this is the right thing to do and they want to help the community to make the transition. But will the user community follow?

Question: does anyone have more than personal anecdotical evidence on this question?

rgaiacs commented 10 years ago

The basic premise in this discussion is that scientific Python users will sooner or later switch to Python 3, the question only being when. Are we certain of that? From my own anecdotical experience, I am tempted to say no. People around me who actually use Python to do science stick to 2.x.

People around me still use Fortran 77 (and some times teach it without mention all the advantages of Python and f2py).

On the other hand, all the big players in the infrastructure (NumPy, SciPy, etc., but also distributions like Anaconda) are pushing for a move to Python 3 and are investing some effort as well. I suspect this is mostly because they believe this is the right thing to do and they want to help the community to make the transition.

From a developer perspective, keep your system safe is hard without updates and most of the time keep compatibility with old programs/libraries just make you write more code and eventually have to maintain the programs/libraries that your work depends on.

But will the user community follow?

IMHO yes. Right now the scientific community don't have incentives to make this transition quickly (your work are recognise by the paper you publish, using Python2 or Python3, and not by the "quality" of your code) but hope that in a near future it will have.

takluyver commented 10 years ago

Last year we did a survey of IPython users, and over 20% of respondents said that they use IPython with Python 3. I'd expect that number to be higher if we did the survey today, with Python 3.4 out. There's still a long way to go, but that's definitely not nobody.

You can read more about the survey results here: http://ipython.org/usersurvey2013.html

ethanwhite commented 10 years ago

others considered switching but depend on some highly domain-specific library that hasn't been ported and perhaps never will be.

It would be useful to get a list of libraries that actually don't work on Python 3. I regularly talk to folks who say they can't make the switch because a particular library isn't released for Python 3, when a quick search shows that it now is. We had this happen on the SWC Lab Meeting a couple of days ago with respect to NiPy. I worry a bit about whether this is a serious impediment vs. something that was an impediment a year ago but is now less of an issue. Some data would be useful here, or at least some clear examples of well used libraries that are not available in Python 2.

asmeurer commented 10 years ago

http://python3wos.appspot.com/

The main ones that I know of are:

asmeurer commented 10 years ago

But I agree that that argument has less and less traction. Most packages that you would want to use are ported by now. These days even when I come across a random Python project on GitHub it either works with Python 3 or there is a pull request to make it work (although in some cases that pull request is authored by me).

ethanwhite commented 10 years ago

Thanks @asmeurer! That's a great resource and an encouragingly short list.

ethanwhite commented 10 years ago

As I mentioned on the SWC call this week, I am moving my university courses over to Python 3 this fall. The integer division in Python 2 has been a constant stumbling block and frustration point for my students every year and now that Anaconda supports Python 3 I'm making the switch.

I'm also moving from a book to using open online materials only. I plan to primarily use SWC materials so that I can use my teaching time to contribute additions and improvements to the material rather than expanding on my existing material in isolation. This will be more difficult if we continue to only focus on Python 2. I bring this up because I think we will increasingly be in this kind of situation where if we want our materials to serve as a central teaching resource we will be split between people teaching/using Python 2 and 3. We can address this in one of two ways:

  1. We could take a leadership role in moving the community towards Python 3. This shift has already started as the IPython survey results demonstrate and unless there are a lot of critical modules that don't support Python 3 (which according to @asmeurer's link above doesn't seem to be a major issue for the core stack) then there is limited risk to new users. This is in line with our general philosophy that the way to change culture is to start with the younger generation.
  2. Alternatively, if we feel that the scientific community isn't going to move or that it isn't our place to help it do so, then I think it will be important to support jointly compatible materials as much as possible to allow folks like myself to easily use SWC's efforts (and contribute back to them) even when teaching in Python 3.
takluyver commented 10 years ago

Perception is definitely an issue. I've seen quite a few online discussions where someone complains that they can't switch until X is ported, only to be told that X was ported a year ago. I think people check once, and then assume nothing has changed. Maybe we need to make more noise when packages gain Python 3 support, but it doesn't seem practical to do that for every package out there.

khinsen commented 10 years ago

@takluyver Thanks for the pointer to the IPython survey, that's exactly the kind of "hard data" I was looking for.

As for the perception issue, it's definitely true, but it works both ways. People eager to move to Python 3 tend to underperceive the obstacles.

As for the lists of ported and unported libraries, I think this is largely irrelevant for statistics. What matters is not how many libraries have been ported, nor how widely the ported libraries are used. What matters is the percentage of scientists writing Python code whose "library needs" are fully covered by what has been ported. I have no idea of how I would estimate this.

My own anecdotical evidence comes from a personal and highly non-representative survey made for much the same reason that we are discussing here: should a planned Python training course use Python 2 or 3? We ended up sticking to Python 2 because none of the Pythonista scientists any of the instructors interviewed informally used Python 3 or was considering to switch in the near future. That was about a year ago.

The obstacles that were cited were almost exclusively little-used domain-specific libraries that nevertheless were essential for someone's work. A controller for a specific piece of lab equipment, or a library to read some weird file format. The kind of libraries written by a thesis student and then never maintained, often not even published.