Closed datapythonista closed 5 years ago
I would like to try to work on this issue.
@datapythonista Do you want the master
branch of pandas be added to the Conda environment or as a submodule in this repo?
No, I'm happy to have just the notebook. If you want you can add a comment at the beginning saying that to run the notebook an environment with a recent version of pandas is needed. But I don't think even that is necessary, just the notebook is enough for me.
Ohh I assumed you wanted the notebook to update with the changes in master
repository using the CI.
We can do that in the future, sounds like a good idea. But I'd start simple, just adding the JSON file (may be zipped) to this repo, and a notebook that opens it, and check how many errors we have pending.
Autogenerating the file sounds good, but I'd recommend never do anything that complex directly. Always go step by step and build things in an iterative way. There are a lot of talented people in this group, if you open small PRs and gather feedback at every step, the final result will surely be much better than if you work in something big by yourself. And using the divide and conquer approach will also make your life much easier.
Maybe if ever planned, the autogeneration of file can be directly worked in the pandas repo. And yes, I agree doing it step-wise with everyone's help would make this task much easier and effective!
Yes, probably doing a clone of pandas master, compiling and then running the script would be the best.
In pandas there are many docstrings that have known errors, like parameters that are not documented, examples that do not run, formatting issues...
We have a script that is able to generate all them in a json file (you need a pandas development environment to run it, and should be run in an updated
master
branch):After generating the json file, we need a jupyter notebook that opens that file in pandas, and shows how many of each error need to be fixed. The resulting notebook can be added to a
notebooks/
directory in this repo.