pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.43k stars 17.85k forks source link

# DEV: Gitpod integration #47790

Open noatamir opened 2 years ago

noatamir commented 2 years ago

We would like to introduce gitpod integration, as a development environment quick start.

Gitpod can provide new contributors, with quick automated, and ready-to-code development environments. Instead of sending them to read your documentation for setup, how about telling them to click/tap a button, and pick an issue, they can already start to work on their first PR?

It may also be useful for experienced contributors, who work on many projects. They might notice something they can quickly make a PR on, but not have the time to open your contributor guide just now.

A gitpod saves you setup time and gets you to contribute your changes faster.

pandas already has a working Docker image, so making the custom gitpod Docker image was relatively easy. I have prepared a docker image and yml file to get things going. There are still a few more steps to complete the integration setup.

Next steps

Attachments

The docker was tested locally as follows: by replacing $gh_username in the dockerfile with your GitHub username, you should be able to run the DockerfileGitpod with the command docker build . -f DockerfileGitpod (from the working directory the file is located in). It can be fiddly on M1 macs 🚨.

mroeschke commented 2 years ago

Open a pandas DockerHub/Quay.io organization account, or use the GitHub container registry, to add the gitpod docker image there

Do you know how Gitpod manages image pulls and if there's quotas?

noatamir commented 2 years ago

I asked on their discord. Will get back to you ☺️

noatamir commented 2 years ago

They haven't replied yet ⏳. I also sent an e-mail now. But based on their pricing page, I suspect that there is no quota since all of their plans include the following: prebuilds: Enable prebuilds to continuously build your Git branches, so you and your team can always start coding right away.

noatamir commented 2 years ago

And we got a reply!

Thank you for contacting Gitpod. As Gitpod does not host any publicly available Docker images ourselves there wouldn't be any limits you'd be subjected to there. You would need to check with whatever registry you're using to see if they have any limits.

mroeschke commented 2 years ago

Okay cool!

Dockerhub (free) account has some limits (100/200 pulls per 6 hours should be okay) https://docs.docker.com/docker-hub/download-rate-limit/

The Github Container Registry isn't as clear to me what quotas exist, but it appears we have to pay for storing images? https://docs.github.com/en/billing/managing-billing-for-github-packages/about-billing-for-github-packages

datapythonista commented 2 years ago

Sorry, a bit late to the party, didn't see this earlier.

I'd personally have this in a third party project. I think a similar discussion happened for VS code stuff, and that was the conclusion. The pandas project is already huge, and the CI huge and very complex. I think it's great that things like this exist if contributors find them useful, but I don't think it should be the pandas core team maintaining them, and the pandas CI and codebase the one bigger, slower, with extra complexity, and with new things that break.

I don't think there is any drawback in using another repo, and we can use one in the pandas-dev org. Even if my preference would be to start in a personal repo first, and move it to the pandas-dev org when the project starts to be mature.

jorisvandenbossche commented 2 years ago

I think a similar discussion happened for VS code stuff, and that was the conclusion.

I am not directly aware of such discussion (we actually do have some VSCode specific configuration already with .devcontainer.json, so this was added at some point. There is https://github.com/pandas-dev/pandas/pull/41721 where indeed you objected further customizing the existing .devcontainer.json setup ). But the one discussion that I found on the gitpod topic is a previous PR (https://github.com/pandas-dev/pandas/pull/34829), where people were actually OK with adding this, the PR only never got merged because of the contributor not further working on it.

As a small anecdote: I helped in two conference sprints the last two weeks, and in the first I had someone contributing using github codespaces, and she repeatedly said how amazing it was being able to directly work on something without to first set up the whole development environment. And in the second there was someone who struggled with the typical "needs Visual Studio Build Tools on Windows -> cannot install this on company laptop without devops involvement", and a setup like gitpod could have helped a lot. To be clear, I know that this only supports that it would be nice to have such gitpod integration set up, not that it necessarily lives in the main repo. I do think that it will be more accessible (since that is the standard approach) and better integrated if it lives in the main repo though.

jorisvandenbossche commented 1 year ago

Some issues we have been running into related to not having write permissions outside of the pandas repo / mamba env:

noatamir commented 1 year ago

The install issue in the last comment is addressed by https://github.com/pandas-dev/pandas/pull/52700 and already fixed in the Gitpod we deployed to dockerhub today.