pypa / packaging.python.org

Python Packaging User Guide
http://packaging.python.org
1.45k stars 933 forks source link

Stance (or discussion) on src/ directory #320

Closed ctheune closed 5 years ago

ctheune commented 7 years ago

Is there an official stance on the "src/" directory thing? I'm all for it (and @hynek seems to agree) but I haven't found any discussion or explanation about it from the PyPA. The guide doesn't mention it and the sample project doesn't use it. Even if the stance is "don't use it" (which I would disagree with) then I'd love to see this discussed in the official guide so we can refer to it when people need to make up their mind or argue it.

nicoddemus commented 6 years ago

@pradyunsg thanks for the extensive summary and the table of the current state of things. đź‘Ť

FWIW in pytest we will eventually change to src layout as our own docs recommend, currently the package is on the repository root because of historical reasons.

pradyunsg commented 6 years ago

@nicoddemus I had not mentioned that the pytest docs say that while talking about out-of-package tests (I'd marked it a note-to-self for some reason). Edited the above summary to mention that.

takluyver commented 6 years ago

I too think I can see which way this is going. I don't have a strong preference, but for the sake of having a clear debate, I'll try to present the case for staying with non-src as the default recommendation.

pfmoore commented 6 years ago

@takluyver I agree with most of your points. The one really difficult issue for me is (essentially) the fact that the current directory is on sys.path, so that trying to use an installed copy of the application "accidentally" sees the local code. That's a Python issue, not a tool issue, so it's particularly tricky to address. Tools like pytest could possibly work around it, which would further weaken the arguments for the src layout, but until (unless) they do, it remains an issue.

The big problem with the src layout is that it's a solution to problems that you simply don't have at the start of a project. How soon you start hitting the issues depends on your development process, and in some cases the answer may well be "never". So unless you take the hit of a more complex layout at the start, you will end up having to balance the cost of change into the equation (which may partly explain why so many projects still use a non-src layout).

I'd actually be fine with the guide covering both options, and explaining the trade-offs. But (a) that would likely be useless in practice, as readers are looking for advice, not options, and (b) it's no use for tools (like flit) that want to support a single layout.

dstufft commented 6 years ago

I don't buy the split of ecosystem issue, because having worked on projects with with src/ and without... it really doesn't matter? There are so many other little differences between two projects whether there's one "extra" directory or not just sort of becomes noise (and tools like GitHub aren't even going to show you src/ as it's own directory if there's only one folder, it'll show you src/foo).

In many ways, it actually makes it easier to work on a project, because you have a pretty clear line about what actually is part of the project, and what is miscellaneous stuff that happens to live in the same directory. For example, here's pip's top level:

.
├── contrib
├── docs
│   ├── man
│   └── reference
├── news
├── src
│   ├── pip
│   └── pip.egg-info
├── tasks
│   └── vendoring
└── tests
    ├── __pycache__
    ├── data
    ├── functional
    ├── lib
    ├── scripts
    ├── unit
    └── yaml

WIthout the src/ directory, you just sort of have ot know that pip gets installed, but contrib or tasks does not (and tasks is a python package itself! totally importable etc).

While you might say big deal, pip is named pip, so it's obvious that it's import pip, you also have to remember that the import name and the project name don't have to match, and a single package can install multiple things. An example of where this happens is Twisted, where projects will have their own top level name, plus then they have to add a file into the twisted namespace in order to register a plugin.

Ultimately, no matter what tools like py.test can or can't do, there's very little you can do to fix python in the current directory, and that is going to confuse people when they try to pop into a REPL to reproduce some issue, and get the local copy instead of the installed copy.

The biggest benefit for the non src/ layout always seems to me to be aesthetics, and frankly I just don't find that argument particularly compelling. It's a single extra directory, it's not the end of the world.

takluyver commented 6 years ago

Paul:

I'd actually be fine with the guide covering both options, and explaining the trade-offs. But (a) that would likely be useless in practice, as readers are looking for advice, not options

My thinking on this is that a guide can include options if there's a clear way to say "use X if...". E.g. the packaging guide can say "X is for libraries, Y is for applications", because readers will mostly know which bit they're interested in. If you need to weigh up a list of advantages and disadvantages, the guide isn't really guiding you.

(Of course, extra options, pros and cons can be described in some appendix where it's won't distract people)

Donald:

having worked on projects with with src/ and without... it really doesn't matter?

To all of us seeped in the practices of Python packaging, it certainly doesn't matter. And even for a new developer, I agree that it's not that hard to understand. But programming involves thousands and thousands of little things you need to know, each of them not a big deal by itself. We can always say "it's not that hard" to excuse every little wart, every extra thing we ask people to know about. But all the little things add up.

So I'd always push back on any 'not that hard' argument (except when I'm the one making it ;-), because to make a system that's clear overall, we have to eliminate the 'not that hards' wherever possible. And I think splitting the ecosystem into src and non-src layouts would be adding one - even if it's worth it for other reasons.

pfmoore commented 6 years ago

So I'd always push back on any 'not that hard' argument (except when I'm the one making it ;-), because to make a system that's clear overall, we have to eliminate the 'not that hards' wherever possible.

TBH, I suspect that in the long run, flit init will make whichever layout we choose "not that hard" for a significant number of people :-)

hynek commented 6 years ago

JFTR the discussion has completely moved from “what is better” to “what has always be done”. And citing opinion pieces that don’t qualify their recommendation seems like mere noise to me. Also the ability to run code without installing it is a clear non-goal to me.

So far the only argument I’ve heard is that you have to press four more keys and that’s apparently not worth the upsides? Well.

Since tooling has been brought up: if you have to type out your full paths, have a look at tools like fzf.

I find it unfortunate that we bind so many consequences to a mere recommendation. You can define best practices and people can disregard them in the knowledge of the consequences. That’s nothing new or special.

ofek commented 6 years ago

@hynek We all know it's not just 4 more keys though :slightly_smiling_face: The recommended and easy discovery method packages=find_packages() would no longer work.

dstufft commented 6 years ago

packages=find_packages(where="src")

ofek commented 6 years ago

In my mind, a standard/recommended approach is at odds with requiring an option. Unless you want to make src the new default lookup too.

takluyver commented 6 years ago

JFTR the discussion has completely moved from “what is better” to “what has always be done”.

I don't think that's true. The discussion is about what the packaging guide should recommend, and we are aware that that recommendation may influence a large number of people.

As a separate but related matter, I intend to make flit enforce 'one way to do it', where that one way is (probably) the recommendation we agree on.

So far the only argument I’ve heard is that you have to press four more keys...

That feels like an unfair mischaracterisation. I listed several arguments for continuing to recommend the non-src layout, none of which were to do with how many keystrokes you need. It looks likely that your preferred option is going to carry the day, but dismissing the counterargument like this poisons the discussion.

hynek commented 6 years ago

That feels like an unfair mischaracterisation. I listed several arguments for continuing to recommend the non-src layout, none of which were to do with how many keystrokes you need.

Maybe the problem is that we’re discussing two different things. As far as I can see (if i missed something I’m sorry, I’m on a phone today):

  1. Donald and I are discussing the upsides of having an src directory and the downsides of the lack thereof.
  2. You are mostly arguing about the implied costs of recommending that.

As expressed in my last message I don’t quite follow that (I think of something superior is found it’s worth recommending it even if everyone doesn’t switch right away ; otherwise we’d still be using GOTO, http and hash our passwords with MD5) but I understand that it must be just as frustrating for you as it is for me because neither side really engages each other’s arguments and treats them as a side battle at best.

Unlike you who bound the future direction of your project to the outcome, I honestly don’t feel like I have a dog in this fight. For me literally nothing changes; no matter which approach wins – I’ll run my projects the way I like it and I expect other opinionated people to do the same. shrug

takluyver commented 6 years ago

I think of something superior is found it’s worth recommending it even if everyone doesn’t switch right away

That makes sense, but the transition itself has a cost, so for me the question is whether it's superior enough to justify switching the recommendation, or whether we should keep the existing recommendation and improve tools to deal with it better. It sounds like it probably is worth switching, but I wanted to give the counterargument a fair airing as well.

Unlike you who bound the future direction of your project to the outcome, I honestly don’t feel like I have a dog in this fight.

And conversely, while I think the recommendation is important, I don't have a strong personal preference for either style. I use non-src layout for all my projects at the moment, and I don't seem to have particularly been bitten by issues with it yet.

ctheune commented 6 years ago

So (as @hynek put it) my dog being in the 'src/' side: I've grown up with src/ (since we started having packaging tools around 2004 or so) in such a natural way that I mostly stumbled over weird things (mainly manifest management and accidents of things being in the package/importable) that I wasn't used to when I previously never thought about not using a src/ directory.

I'm not quite sure why not having src/ makes things in any way easier except that it's really easy to have things potentially clobber up the python global namespace. Saying that: is there even any magic around that stops 'setup.py' from becoming importable. This happens less frequently if importable stuff moves over to src/ where you do not share that one big '/' namespace. And then again, the lore of tim tells us:

Pro src/: (IMHO)

Contra src/: (very IMHO)

(The other's I couldn't decide for which column they should go into. And man did I feel old considering the time setuptools appeared. distutils is too old even for me to clearly remember it's inception.)

pfmoore commented 6 years ago

So here's a question. We're mostly going round the same arguments here. Who's going to decide when it's all been said, and what the "official" recommendation will be? It's not clear to me that we're ever going to get complete consensus.

In the absence of a recommendation here:

  1. New projects will follow the current status quo, which is the non-src layout.
  2. Flit will continue supporting the non-src layout, which will drive yet more projects (those that want to use flit for its advantages over setuptools) to use that layout - and furthermore, it will lock them into that layout as switching will involve changing tools.

So no recommendation probably equates to choosing the non-src layout.

If we do recommend the src layout:

  1. Flit will change, bringing its userbase with it (existing flit projects will be forced to change as support for the non-src layout will be dropped, if I understand @takluyver's intent) as well as any new projects using flit.
  2. The status quo will remain non-src in the short term, but the effect of (1) will be to change that gradually.
  3. Projects looking for a recommended approach (as opposed to just copying another project) will use the src layout, further changing the status quo.

Like it or not, the above picture gives flit a lot of power over this choice. I'm not sure whether that means that we should look to @takluyver to act as BDFL for this decision (because persuading him is key anyway), or conversely whether it means that we need someone else to take that role (because @takluyver has stated that flit will follow whatever decision is made here)...

Disclaimer: I prefer the src layout, but I also want to use flit for my projects. So I have a strong personal interest in seeing this decision go in favour of the src layout.

theacodes commented 6 years ago

If @takluyver doesn't feel comfortable, I'm happy to make the decision.

It would come down to three specific action items:

  1. The new packaging tutorial (#498) would use the preferred layout.
  2. We would write a new discussion topic on the pros and cons of the respective package options, linking to the abundant amount of resources.
  3. Flit would adopt the PyPA preferred layout over whichever time frame they deem appropriate, and would have backing of PyPA for being opinionated about that layout. (it seems @takluyver is up for this either way).
pradyunsg commented 6 years ago

I can help update the tutorial and am happy to write the discussion document for this, if there's no one else to do either.

One thing I'd point out is that these would become recommendations for PEP 517 build backends - flit is among the first implementors and I expect there to be more.

So, if anything, this would be a good time to have a recommendation like this.

takluyver commented 6 years ago

I'd rather let someone else make the call on this one; @theacodes I'm happy for that to be you, if that's OK with other people too.

I'm still a bit concerned that this discussion seems to be dominated by fans of the src layout. I've tried to present the case against, but I don't really have much of a preference, and my way of looking at it doesn't seem to resonate much with people. Should we give someone who really prefers non-src a chance to make the case? Perhaps @kennethreitz , who has written guides recommending that.

One other question that occurred to me: I like running a module's tests locally without installing it, and knowing that I've picked up that version rather than a copy that might be installed elsewhere. How easy do tools like py.test make this with a src layout?

pradyunsg commented 6 years ago

I'm still a bit concerned that this discussion seems to be dominated by fans of the src layout.

I agree.

How easy do tools like py.test make this with a src layout?

PYTHONPATH=src pytest

pradyunsg commented 6 years ago

@theacodes I'm happy for that to be you, if that's OK with other people too.

Fine by me. :)

hynek commented 6 years ago

I like running a module's tests locally without installing it, and knowing that I've picked up that version rather than a copy that might be installed elsewhere. How easy do tools like py.test make this with a src layout?

Why do you like that? The whole point of src is that it doesn’t happen so if there are valid use cases they should be put on record.

ncoghlan commented 6 years ago

Note that the ENVVAR=value command approach doesn't work on Windows (at least, as far as I know). pipenv install --dev -e src should work reliably cross platform, though.

As far as the recommendation itself goes, I personally tend to work on complex projects where we end up with a lot of administrivia cluttering up the repo, so cleanly separating out a src directory for "this is what gets installed as a Python package" can be handy.

Even if you look at a simpler project like walkdir, which is deliberately just a single source file with a single test file next to it, I think using the src layout would more clearly separate the packaging and testing config files from the actual code.

However, one argument against recommending the src layout to beginners that I personally consider to be fairly strong is that they'll discover that "import src.mypackage" works, and then potentially get themselves into trouble that way. By contrast, making the "non-src isn't working for me, so I'll switch to the src layout" connection seems like a much easier learning path to guide people down than it does attempting to explain why even though the src.* import style works, they should never ever use it. It feels pretty similar to linters and CI in that regard to me - yes, those things will save you a lot of time in the long run, and when you're ready for them, you'll hopefully embrace them with glee, but if you try to introduce them too soon, you'll just confuse people (it strikes me as being similar to the reason why Software Carpentry had to drop unit testing entirely from their 2-day bootcamp curriculum: for the vast majority of their attendees, trying to get into even the bare essentials of automated testing turned out to be a case of "too much, too soon")

takluyver commented 6 years ago

Why do you like that? The whole point of src is that it doesn’t happen so if there are valid use cases they should be put on record.

Well, if this is the copy I'm changing, this is often the copy I want to test. For a lot of packages I have a development install, so it's the same copy anyway, but when I don't, I'm usually more interested in testing the development copy than the same package that I have installed somewhere else.

I thought the argument was that you should be able to choose whether you're testing installed or in-place, and it should be harder to get the wrong one accidentally. Not that it should be hard to test in place.

Nick: That's a good point about src.mypackage. An unfortunate side effect of implicit namespace packages.

hynek commented 6 years ago

Hm that sounds like you don’t use virtualenvs which makes your point make more sense. How is the recommendation stance on that? If you use virtualenvs it seems like a non-point to me, because It’s unlikely to have something installed and editing at the same time. Is this maybe a science stack thing where some things IIRC just can’t be installed into virtualenvs properly (at least on macOS).

takluyver commented 6 years ago

Yup, I usually install stuff with --user, and I only fire up an environment when I want to test a specific different version. It may be a habit I picked up from the days when packages like numpy and matplotlib couldn't easily be pip-installed. But now I like it because it means I'm regularly dogfooding bits of code as I work on other packages. For instance, whenever I publish a package using flit, I'm probably using the latest commit, so it's an extra chance for me to notice any issues in flit before release.

I'm sure there are cleverer ways of doing all this, especially nowadays with tools like pipenv and pipsi. But I'm used to doing things this way, and learning new tools takes effort in itself.

ncoghlan commented 6 years ago

For me, I think this is getting into a similar use case gradient to the one we encountered when considering whether or not the pipenv tutorial should replace the pip tutorial or supplement it (see https://github.com/pypa/python-packaging-user-guide/issues/394 for details on that).

The non-src approach fits nicely into a learning curve that starts with:

  1. I am using Python scripts and helper modules in a personal directory (maybe version controlled, maybe not), and installing Python packages I need into my personal Python environment (maybe a personal Python install, maybe a conda environment, maybe my user site-packages) (aka our package installation tutorial)
  2. I am using Python scripts and helper modules in a version-controlled application directory, and want to install Python packages into a location specific to that application (aka our application dependency management tutorial)
  3. I am now wanting to convert some of my Python script(s) and helper modules into a shareable Python component that other Pythonistas can depend on (aka our package distribution tutorial)

Attempting to introduce src directory adoption as a pre-requisite for reaching stage 3 thus feels like introducing people to a solution to problems that they may not have yet, and relying on "trust us, it's worth it in the long run" to get them over the additional bump in the learning curve, even when it isn't clear yet whether or not there'll even be a long run for the layout change to pay for itself.

If we adopt this framing, then what's likely to make sense is to stick with the simpler non-src layout for the tutorial, and have an opinionated "Structuring complex projects" guide that talks about things like:

For my own current use cases for example, I don't actually have a single src directory to worry about - the directories are named after their target environments, and there's a separate dev directory to hold the support scripts.

The structure for beaker-project.org was similar - the central management server, the individual lab controllers, and the CLI each had their own directory, and we added a Misc directory for the dev process support code.

dstufft commented 6 years ago

I don't think the benefits of src/ have anything to do with how complex your project is, and I think framing it in that way does a disservice to the idea, because realistically I think every project should be structured in a way to prevent implict importing from ., for much the same reason Python removed the implicit relative imports.

I don't think there is any benefit to a new user for not adopting the src/ layout other than "I didn't have to type mkdir src/ && mv foo src/. The benefits to not adopting it wholesale are more about the existing corpus of projects than anything else.

ncoghlan commented 6 years ago

Thanks to implicit namespace packages, using a src directory doesn't prevent imports from . though - it just makes you spell those imports as import src.local_package rather than spelling them as import local_package.

The gist of my comment above is that if you don't have a test suite yet, and aren't using linters and/or typecheckers yet, and aren't using tox for cross-version testing yet, and aren't using a pull request based workflow with pre-merge CI yet, then there are more important questions to be asking yourself as a project maintainer than "Should I be using a src directory or not?".

If you don't like the "complex project" framing, then I think an equally valid framing would be to call the more advanced guide "Handling project maintenance", and then explain that:

  1. Automating your workflows (especially those that you want to run through for every PR or every release) is an excellent idea that can make a project a lot more enjoyable to work on for both contributors and maintainers, as well as making version upgrades more reliable for end users
  2. Using a "src" directory can make it a lot easier to get your releases and workflow automation to be reliable, since it means that src relative imports won't work from the root of the repository (which is where most automation scripts, including test suites, are likely to be run from), and hence avoids a lot of cases where things appear to work in a development clone, but don't work when installed as a package

Essentially: the packaging tutorial would get you to the point of making your first project release to PyPI. The Project Maintenance Guide would then provide advice on coping with the consequences of that questionable life choice :)

Adopting src directories is definitely an example of a good coping strategy, but we also have plenty of evidence to indicate that they're far from being a necessary strategy, and so it makes sense for their adoption to continue to be needs driven on a project-by-project basis (similar to any other risk management technique, like dependency pinning, automated testing, structural linting, type hinting, etc).

dstufft commented 6 years ago

Thanks to implicit namespace packages, using a src directory doesn't prevent imports from . though - it just makes you spell those imports as import src.local_package rather than spelling them as import local_package.

I suspect exactly zero people are typing import src.local_package accidentally, but a lot of people are typing import local_package and expecting to get the installed copy. This feels like a post hoc rationalization for the status quo.

There are more important questions to be asking yourself as a project maintainer than "Should I be using a src directory or not?".

Sure, but using a src directory is a very minimal amount of effort for little downside and possibly immense gain. The beginning of the project is also when the cost is lowest for moving into a src/ directory (not that the cost is ever super high), and it's also one where the beginners are more likely to benefit from it than experts who are less likely to make mistakes like forgetting to add their thing to py_modules and then assuming pip install . && python -c "import mything" means it worked correctly.

It also seems counterproductive to me that we'd document one way, then suddenly in a second guide go "but you know, that totally arbitrary choice we made before? Now you should switch to a different arbitrary choice because it's more maintainable!". It would make sense if we were talking about the amount of effort it takes to learn/use tox, linters, test suites, etc... but this is literally just creating a directory and moving content inside it.

ncoghlan commented 6 years ago

You just stated the biggest downside yourself: using a separate source directory breaks running python -c "import my_package" from the root directory of your project, which means the habits that developers have learned working on personal automation scripts and git-published applications stop working, and they now have to acquire a new set of habits that are more appropriate for publishing software components for others to use as dependencies. (And no, folks new to package publication don't write "import my_package" and expect to get the installed copy, because their package has never previously been released in an installable form)

That said, it's likely possible for a src-based tutorial to use pipenv to manage that potential problem, by introducing pipenv install -e src and pipenv shell as the tools to get python -c "import my_package" working again after moving the "software to be published as a Python package" into a subdirectory.

That's the way I set up https://pagure.io/modularity/fedmod/tree/master for example, and I think it works really well (essentially, you treat "local development environments for this project" as a git-published application in the pipenv sense, and make your src directory one of that application's runtime dependencies).

That approach also scales nicely to managing multiple subcomponents (since you just pip install -e <dirname> for each subcomponent), and would set us up nicely for a future where Python runtimes natively support this execution model (the draft PEP for that is still fairly immature, so I won't link to it directly, but I'll note that both @dstufft and I are directly involved in reviewing the pre-release drafts).

So yeah, I think if the revised package distribution tutorial makes the assumption that the local development environments are being managed with pipenv (or a comparable tool), then I think introducing the src directory concept at that point will make sense, with the rationale of providing a clear separation between the code that end users are relying on directly and the additional code and configuration settings that exist in the project to support the ongoing maintenance and development of that published code.

We'd then only revert to the "start with non-src, introduce src in a follow-up maintainability improvement guide" approach if we find the reliance on pipenv to be causing problems for folks trying to work through the revised publishing tutorial.

dstufft commented 6 years ago

I don't think it matters if the tool is pipenv install -e ., pip install -e ., setup.py develop, flit install, or something else.

I think the right place to start talking about it is when someone is learning to package their software, learning that with packaged software you're going to be installing it, and starting to import from that installed location by default, etc is part of learning to package software IMO. I think it trends nicely to a midpoint success in packaging, "oh look, you've done all of the work setting up the package, try to pip install it now to see it working" lets them feel success before they've started to figure out how to upload it to PyPI or somewhere else.

Especially if that can go from python -c "import foo" fails, now install and try it again. Now you're thinking with portals! sort of thing to make the new user feel like they've accomplished something.

hynek commented 6 years ago

I feel like I’ve said everything so I’m gonna peace out but on my way out I gotta agree with Donald one more time. What we’re talking about here is packaging libs in order to put them on PyPI – not learning to program a computer. Making pip install -e . and pip install -c … work is a great intermediate step.

ncoghlan commented 6 years ago

Right, I think we're agreeing now, but if we're going to be recommending the src directory approach, the tutorial can't leave the answer as "use some tool to make the import work again" - it has to be very prescriptive as to which tool, and how to set it up to make imports work again once the module to be imported is no longer in the current directory.

Since the application dependency management tutorial already introduces pipenv, that has the potential to segue nicely into introducing the "development environment dependency management" use case as part of the package publishing tutorial.

dstufft commented 6 years ago

Right. I’m saying I don’t care which tool, not that the doc shouldn’t tell them a tool to use.

Sent from my iPhone

On May 20, 2018, at 12:33 AM, Nick Coghlan notifications@github.com wrote:

Right, I think we're agreeing now, but if we're going to be recommending the src directory approach, the tutorial can't leave the answer as "use some tool to make the import work again" - it has to be very prescriptive as to which tool, and how to set it up to make imports work again once the module to be imported is no longer in the current directory.

Since the application dependency management tutorial already introduces pipenv, that has the potential to segue nicely into introducing the "development environment dependency management" use case as part of the package publishing tutorial.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

TiemenSch commented 6 years ago

So, I was looking into project setups because I'm going to start on a big one, but in the tests pickup the wrong code case against non-src: wouldn't it be just as easy to run the tests from another working directory? E.g. switch directly into /tests and run them from there? That solves your 'seeing the package' pretty quickly too I guess.

Or, since many projects will use the non-src setup for a long time, I guess it's up to testing tools to provide a friendly warning that states it's using current workdir code.

I'm in the src seems unnecessary camp still, especially for smaller projects. Also, moving it to src so you can see what gets packaged is a bit of a false promise, as you could have all sorts of files in there (not saying you should) that are or should be excluded.

pradyunsg commented 6 years ago

moving it to src so you can see what gets packaged is a bit of a false promise

Could you elaborate on why you say that; maybe with an example?

TiemenSch commented 6 years ago

It still depends on what's in your MANIFEST.in or setup.py right? Of course if you just globstar whatever is in src, then it will be included. Otherwise, it's still just the .py stuff.

Moving it to src seems to revolve around detecting it when it shouldn't always be detected for some tool or command.

Especially since it seems that the landscape isn't at all src like, I would say any tool that is sensitive to the use of the current workdir should state src is a prerequisite or offer a warning (preferably with a solution).

hynek commented 6 years ago

wouldn't it be just as easy to run the tests from another working directory?

Yes but then you have to make sure that it happens every single time. It’s another moving part you have to keep in mind. The point of src is to be safe by default, not safe by bending backwards.

anthrotype commented 6 years ago

Not sure if anybody mentioned this already. I only recently discovered that tox also offers a changedir option to change the working directory when executing the test commands. I still prefer to use the src/ layout if I can.

TiemenSch commented 6 years ago

Yes but then you have to make sure that it happens every single time. It’s another moving part you have to keep in mind. The point of src is to be safe by default, not safe by bending backwards.

There's two ways to look at it.

Anyway, in all cases, you still have to figure out a very exact setup, because setups should be precise.

dostuff ./src/ and cd stuff && dostuff are pretty much equally good/bad.

And it may be re-iterating, but I can't help feel that production environments and CI tools should warn you when your current setup is dangerous.

There's two solutions to the same problem and I think it boils down to preference. I'd say no tool should have either option "hard baked" in. Having to find files at fixed paths will eventually break stuff.

ionelmc commented 6 years ago

dostuff ./src/ and cd stuff && dostuff are pretty much equally good/bad.

From correctness point of view not really:

Anyway, I think this can be debated till the end of time. The only thing worth crying over is the sad consequence of one of Python's many half-baked features (being able to import your code as src.something).

IMO the packaging guide should have bias towards producing quality distributions, not towards being able to run a bunch of scripts quickly.

ncoghlan commented 6 years ago

@ionelmc Please keep in mind that you may be collaborating with the designers of features that you're off-handedly describing as half-baked (and those features may have been the end result of months or years of extensive design debates).

ionelmc commented 6 years ago

Just because someone put effort into it doesn't mean that I must like it, or that it's perfect. I hope you realize you can't please everyone, nor can you expect everyone to pretend they are pleased by whatever you are doing.

The same thing with this packaging guide, you cannot please everyone and give a really easy way of running stuff from a package while also giving a reliable way to package and running stuff.

Anyway, now I realize I don't actually have anything else to add to this debate besides colorful criticism so I'll unsubscribe. Don't mention me anymore and I'll leave you be :)

pradyunsg commented 6 years ago

This discussion seems to have settled down.

So, finally, how do we want to go here? Do we recommend one of src/ or non-src layouts or just stay neutral?

As @pfmoore noted above, not recommending is equivalent to choosing the non-src layout.

astrojuanlu commented 6 years ago

cd tests && python -m pytest [...] is quite readable too.

Readable, but gives false hope. pytest does magic to retrieve the rootdir, and therefore the option of moving to a directory might not work. I just discovered this, so my position for src/ layouts is even stronger now.

certik commented 6 years ago

Here is the strongest argument against src, that hasn't really been discussed so far:

However, perhaps there is a technique that will get me the easy development with src, and I just don't know about it (some of the proponents of src, please teach me, and then let's document this).

Let's discuss how the development actually gets done:

a) without src. You modify the Python files, and then you test your modification either by a simple script that you put up for this purpose, or by modifying tests and rerunning tests. You do not need to install the package. You use a Conda environment, but never install your package into it, you test your package locally. This works no matter if you have the tests inside the package, or external, because you want the local version to be used. However, on your CI, you should test the installed version (to ensure that it works for users), and you have to ensure that the installed version is tested. If the tests are part of the package, then pytest does the right thing. If they are external, then the only robust solution that I found is to literally remove the package directory (pip install .; rm -rf my_package; py.test tests/) on your CI. Then the installed version will get tested, because there is no local version anymore. For this reason, I prefer the tests to be part of the package, so that I don't have to worry about the problem.

b) with src. You have to install the package. Typically, your package has many dependencies, some of them take long to install. So you have a Conda environment with the dependencies that is activated. If you install into this environment, it becomes "dirty", and the next day I can't remember what state it is in. So I reinstall the environment, but that is not immediate, Conda takes easily 20s or even more. And I have to keep doing this every day in the morning when I start development. When I do pip install -e ., does it correctly remove the old installation? Always? How about python setup.py develop? I know for sure that python setup.py install didn't use to remove the old files, which then broke the new installation in corner cases. Very messy. The cleanest way is to always start with a fresh environment, but it's slow for lots of dependencies using Conda.

If the official recommendation becomes to always use src, then the above needs to be addressed, and step by step instructions must be provided for b) that are as easy as in a).

Honestly, if the package has to be installed in b), then there is almost no way the instructions can be as easy as in a), because suddenly you have to be fiddling with environments in a much more complicated ways then just installing all your dependencies into one and using it in a).

(I read over 10 blog posts about this exact problem and finally discovered this github issue. I read all the comments here so far.)


Here is a high level philosophical way to look at this, that motivates the approach a).

Conclusion: I've been using the non-src way, and I will keep using it, until somebody can show me a development workflow with src that is as easy as a). And if the answer is to use some kind of environments, then that's just more complicated than doing things locally without installing.

scolby33 commented 6 years ago

@certik thanks for the deep contribution to this discussion! I've been following it for a while but decided to chime in as a src proponent to place my experiences in contrast to yours.

Like you seem to, I always develop in a virtual envrionment as soon as my program has an external dependency. I have a boilerplate setup.py and setup.cfg that I add early on in the development process (usually copied-and-pasted from my previous project--I really should get cookiecutter set up someday...). I will admit that sometimes adding setup.py is a pain as an extra step, but at some point I'll have to write it if I want to distribute my work, so it's time that will be spent anyway.

I have never once had a problem with pip install -e . that wasn't my own fault and I haven't had any problems in that vein recently (I think I've made most of the mistakes?). I don't tend to need to uninstall my package or recreate my environment. If I add a new external dependency, it goes in setup.py and I run pip install whatever.

I will admit that none of my projects have involved non-Python dependencies like C or Fortran libraries, so I cannot speak to that portion of your argument.

From a philosophical perspective, I don't like running my work-in-progress differently from how it will be run by users in production. I do use CI that runs with the installed version of the package, but I like the ease of having my project installed locally as well. A particular benefit for data-munging programs is activating the development virtual environment in a different directory full of input files and being able to run or import the program without messing with an absolute path or changing sys.path in the REPL.

Additionally, I really hate the idea of mypackage/tests living alongside mypackage/__init__.py, which is my interpretation of what you mean by "tests as part of the package." I see a clear distinction between tests and the actual program and this just feels a bit wrong to me.

I use tox to run all my tests, which keeps things even more isolated--the package is installed in a unique virtual envrionment for each type of test, which also implictly tests the installability of the package. This also means that tests as run locally and as run on CI are the same--the CI just calls tox.

In conclusion, I use a src-based layout for all my projects and have been quite happy. I think that either using src is not as hard as you fear or that I've grown to accept the difficlties inherent in it and no longer feel them as you seem to. In my selfishness, I would prefer that the src layout is standardized and recommended so all the fancy new tools work with my way of thinking, but from an objective perspective, I also like the clear conceptual separation between code and tests and the fact that it forces you to run your in-development code the "right" way, where "right" means "the way the users will."

TL;DR

I would love to hear from you or others what difficulties you've had that I seem to have avoided, especially the increased complication that you find in pip install -e . I don't tend to think of that line as being any more difficult than installing all the rest of my dependencies in an environment.

certik commented 6 years ago

@scolby33 thanks for the write up. So I wrote down how one actually develops a non-src and a src package, step by step, so that we can compare the two approaches, with real projects. Let me know if I got it right in each case.

SymPy (non-src)

Prepare (once)

git clone https://github.com/sympy/sympy
conda create -y -n sympy python=3.7 mpmath pytest

Develop (every day)

Start:

cd sympy
conda activate sympy

Workflow:

  1. Modify some files, say, in the sympy/polys directory
  2. Test these particular changes:

    pytest sympy/polys/tests/test_solvers.py

Repeat 1. and 2. The sympy environment only has the dependencies, it doesn't get modified and doesn't have the sympy package.

Flake8 (src)

Prepare (once)

git clone https://gitlab.com/pycqa/flake8
conda create -y -n flake8 python=3.7 pytest pyflakes pycodestyle mccabe
cd flake8
conda activate flake8
pip install -e .

Develop (every day)

Start:

cd flake8
conda activate flake8

Workflow:

  1. Modify some files, say, the src/flake8/statistics.py file
  2. Test these particular changes:

    pytest tests/unit/test_statistics.py

Repeat 1. and 2. The flake8 environment has both the dependencies and the flake8 package in the development mode.


Note: these two packages (SymPy and Flake8) are both pure Python. A typical package will also have compiled code, and there things get more complicated in each case. I actually prefer to use cmake to build the C++/Fortran stuff and Python wrappers, and I used to hook cmake into setup.py manually, but I plan to switch to https://github.com/scikit-build/scikit-build, which does the same, but is a maintained and reusable package. There are now all the cmake options to build (in-tree, out-of-tree, install, without install, etc.), and I am sure things can be amended to closely follow either of the pure Python setups above. So I would first like to come to some generally agreed upon "best practices" for pure Python package development, and then do the same for compiled modules.

Update: based on comments below, the pip install -e . command has been moved from "Develop" to "Prepare".

nicoddemus commented 6 years ago

@certik you might be interested in ESSS/conda-devenv, which was a tool we at ESSS developed to solve this exact problem: work with a package in develop mode in conda environments. With it, you don't need to use pip at all and your environment will always be up-to-date, without risk of leaving it dirty.

It has worked very well for us for a couple of years now.