pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.32k stars 17.8k forks source link

Run a User Survey #27477

Closed TomAugspurger closed 5 years ago

TomAugspurger commented 5 years ago

It'd be nice to run a survey to better understand what our users want from the library.

We can ask things like

I plan to put together a form sometime in the next couple weeks. If there are questions you'd like to see included, please post them here.

See https://github.com/dask/dask/issues/4748 and https://t.co/OGrIjTLC2G for inspiration.

mroeschke commented 5 years ago

I'd be great to have a question to gauge if users utilize stackoverflow vs our documentation for API references, general usage, feature discovery, etc.

TomAugspurger commented 5 years ago

@pandas-dev/pandas-core I've started this https://docs.google.com/forms/d/e/1FAIpQLSeo-SPbammOHYaB5WxQsKez14xOEzL5V8GK6GaUQR9EykxPzw/viewform?usp=sf_link (using the Dask survey as a template).

I'd like to collect feedback on the question for a week or so before launching the survey. I'd appreciate feedback on the questions present there, and if there are any you think are missing.

datapythonista commented 5 years ago

Great job!

Just two small additions I'd have:

For the workflow, besides notebooks, standalone .py, and app on top of pandas, I'd have pandas as part of a project. May be building on top of pandas considers that case, but at least to me sounds more like building something like geopandas than using pandas in my machine learning application.

Then, for the documentation page, besides the ones listed (api, user guide,...) I'd add the ecosystem, getting started...

Other than that, looks perfect, looking forward to see the results.

gfyoung commented 5 years ago

Two comments:

1) For the question regarding which readers and writers you use, I can only select one option (e.g. CSV, HTML, etc.), when the question should allow me to select multiple.

2) When asking about comparing the two documentation hierarchies, I would rephrase it to: "how does the new documentation compare to the old?" with three options for better, unchanged, or worse. This allows for a little more color on people's opinions.

Otherwise, looks really good!

TomAugspurger commented 5 years ago

I'd have pandas as part of a project. May be building on top of pandas considers that case, but at least to me sounds more like building something like geopandas than using pandas in my machine learning application.

I intended to capture the "using pandas as part of a larger project" users with that. I don't think there will be enough "building a library like geopandas on top of pandas" responses to be meaningful. I've rephrased as "Using pandas within a larger project" but this question is a bit jumbled. The main question I want to answer is "how common is interactive vs non-interactive use". So maybe we really hone in on that and just ask "Do you use pandas interactively (IPython, Jupyter Notebook, Python REPL, etc.)?"

toobaz commented 5 years ago

Nice job!

Maybe in "Which extension types do you use?" we can add Int64?

"What Pandas resources have you used for support in the last six months?": maybe add gitter and mailing list?

I like the "Is Pandas stable enough for you?" question: maybe we could add more nuanced replies? Could be 1) no 2) yes for personal use 3) yes for analysis/R&D 4) yes for production

"What common feature requests do you care about most? ": I'm confused, don't we already have "Integer missing values" with Int64?

"What are some other libraries that you often use with pandas?": I think we could make this slightly more focused. Maybe we are not interested in a user using pandas together with django; vice-versa, we are interested in a user using pandas in some projects and dask, or xarray, in others.

TomAugspurger commented 5 years ago

I've taken a couple questions from https://www.jetbrains.com/research/python-developers-survey-2018/, to let us benchmark against the general Python user population. ("Is Python your main language?", "Do you use any of the following tools to isolate Python environments, if any?", and "What operations systems do you use?".

Let me know if there are others I should copy (though we need to keep survey length in mind).

I may reorganize things to split the "who are you" questions (how long have you used python / pandas, etc.) into its own section.


For the question regarding which readers and writers you use, I can only select one option (e.g. CSV, HTML, etc.), when the question should allow me to select multiple.

Fixed.

When asking about comparing the two documentation hierarchies

Fixed

Maybe in "Which extension types do you use?" we can add Int64?

Fixed

"What Pandas resources have you used for support in the last six months?": maybe add gitter and mailing list?

Done, though I hope that doesn't encourage the use of Gitter for usage questions :)

I like the "Is Pandas stable enough for you?" question: maybe we could add more nuanced replies?

I've copied this question from the Dask survey. I'd like to keep it the same to allow us to benchmark vs. their results.

We could collect information on their typical usage of pandas ("Do you use pandas for personal use / R&D / production systems") and then facet the "is this stable enough" by that response.

"What common feature requests do you care about most? ": I'm confused, don't we already have "Integer missing values" with Int64?

This question may have been too clever. I'm wondering what percent of our users actually know about Int64 :)

"What are some other libraries that you often use with pandas?"

This was also taken from Dask. I'd like to keep it the same unless there's an especially compelling reason to differ. Also, personally I would be interested in seeing what percent of our users also use Django vs. Flask :)

TomAugspurger commented 5 years ago

Hi all, I think this is at the point where a few people taking the survey would be helpful: https://docs.google.com/forms/d/e/1FAIpQLSeo-SPbammOHYaB5WxQsKez14xOEzL5V8GK6GaUQR9EykxPzw/viewform?usp=sf_link In particular I'm invested in

  1. Spotting typos
  2. How long it takes (I'm shooting for 5 minutes, 10 if necessary)
  3. Extra answers for "select many" answers (e.g. What common feature requests do you care about most?, Which pandas APIs do you use?
  4. Question areas we're completely missing

I plan on making it live next week and publicizing it in a few places (Twitter, mailing list, docs homepage).

shoyer commented 5 years ago

I went through the survey. It took me about 5 minutes, so the timing is good.

Feedback on specific questions:

I would also be curious what people think about removal of Panel, e.g., has this stopped them from upgrading pandas? e,g.,

jbrockmendel commented 5 years ago
TomAugspurger commented 5 years ago

Thanks. I'll push updates for those.

@jbrockmendel's comment about source code clarity got me thinking about type annotations. IIRC there are sphinx packages for including the type annotations in the HTML docs. I'm not sure if that would be useful to users though.

TomAugspurger commented 5 years ago

An optional field to explain why could be a good idea. I said "Worse" but only because I don't like how the API docs are now split across many pages, which makes them harder to switch in a browser.

I don't see an easy way to do this with google forms. It's possible, but it requires breaking up the sections which I would rather avoid. From what I can tell, the best option is just asking "If think that the new documentation layout is worse, what do you dislike about it?".

Adding the question about Panel.

TomAugspurger commented 5 years ago

This is live now https://t.co/vTM73TFuzy?amp=1

TomAugspurger commented 5 years ago

May be worth collecting notes for next year's survey.