pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.4k stars 17.83k forks source link

DOC: Fix docs for various offset constructors #52431

Open Dr-Irv opened 1 year ago

Dr-Irv commented 1 year ago

Pandas version checks

Location of the documentation

Multiple places (list is just a sample):

Documentation problem

They are all missing a definition of the constructor, e.g., WeekOfMonth(n=..., normalize=..., week=..., weekday=...) should be at the top for the WeekOfMonth followed by a description of the parameters.

Suggested fix for documentation

All of these should be checked to see if all the parameters are listed correctly. The way they are defined means that positional arguments will work, but maybe we don't want to support that. (If that's the case, that's a separate issue)

The constructors should appear at the top with their arguments, like is done for other classes.

choudharynishu commented 1 year ago

I can help with this @MarcoGorelli. Would it be correct to assume in most cases the definition to be added is similar to 1.0 ? I checked WeekOfMonth from 1.0

MarcoGorelli commented 1 year ago

Hi @choudharynishu - not sure what you mean by similar to 1.0, but what needs doing for WeekOfMonth is to add a description for the normalize parameter

choudharynishu commented 1 year ago

take

Dr-Irv commented 1 year ago

Hi @choudharynishu - not sure what you mean by similar to 1.0, but what needs doing for WeekOfMonth is to add a description for the normalize parameter

The issue goes beyond WeekOfMonth. What ought to be done is twofold:

  1. Check for each of the offsets that the list of parameters is correct.
  2. Change the documentation generation process so that the parameters to the constructor are listed at the top.

With respect to (2), if you look at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.html, you will see at the top:

class pandas.Index(data=None, dtype=None, copy=False, name=None, tupleize_cols=True)

If you look at any of the offsets, e.g. BusinessDay at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.BusinessDay.html, you see this:

class pandas.tseries.offsets.BusinessDay

I believe for BusinessDay, based on the code, it should be:

class  pandas.tseries.offsets.BusinessDay(n=1, normalize=False, offset=timedelta(0))

In fact, for BusinessDay, the parameter offset is even missing from the list of parameters on the doc page.

So to close this issue, a complete review of all the offsets should be done, with changes made to reflect the correct constructor and the parameters.

Here's a list (under details) that I created by doing some grep on the doc source:

- DateOffset - BusinessDay - BusinessHour - CustomBusinessDay - CustomBusinessHour - MonthEnd - MonthBegin - BusinessMonthEnd - BusinessMonthBegin - CustomBusinessMonthEnd - CustomBusinessMonthBegin - SemiMonthEnd - SemiMonthBegin - Week - WeekOfMonth - LastWeekOfMonth - BQuarterEnd - BQuarterBegin - QuarterEnd - QuarterBegin - BYearEnd - BYearBegin - YearEnd - YearBegin - FY5253Quarter - Easter - Tick - Day - Hour - Minute - Second - Milli - Micro - Nano - Frequencies
Snehaaa18 commented 1 year ago

is this issue open ?? I am a beginner contributor

MarcoGorelli commented 1 year ago

yup! you're both (@Snehaaa18 and @choudharynishu) welcome to work on this, there are a lot of offsets which need updating

if you find an offset's docstring to work on, which has a parameter not documenting, and would like to work on it, perhaps comment here so we don't duplicate work

Snehaaa18 commented 1 year ago

Take

anshuman0123 commented 1 year ago

Hey , I am totally new to open source contribution . Please someone help me in making contribution . My first question is . Where the files are stored in which I have to work on . Give me the exact location in this repo

Dr-Irv commented 1 year ago

Hey , I am totally new to open source contribution . Please someone help me in making contribution . My first question is . Where the files are stored in which I have to work on . Give me the exact location in this repo

Two other people are working on this issue. To get started with contributing, see https://pandas.pydata.org/pandas-docs/stable/development/contributing.html

natmokval commented 1 year ago

I'd like to work on it. I will add parameters to offsets classes: BYearEnd, BusinessHour, WeekOfMonth.

dullibri commented 1 year ago

I would like to work on Second, Hour and Day

natmokval commented 1 year ago

I would like to work on FY5253 and FY5253Quarter: add missing parameter and examples.

I will add missing parameters "n" \ "normalize" \ "offset" to BusinessDay, Week , LastWeekOfMonth, CustomBusinessHour, and BusinessDay as well.

vinitaparasrampuria commented 10 months ago

Hi! Can I work on this if the issue is still open?

Dr-Irv commented 10 months ago

Hi! Can I work on this if the issue is still open?

Yes, I will assign to you as well. See the comment at https://github.com/pandas-dev/pandas/issues/52431#issuecomment-1499446218 for the list of everything that needs to be checked.

mohmmadAyesh commented 6 months ago

Hi it seem all of them get parameters except dateOffset which is a standard class and Frequencies I would like to contribute I didnt find frequencies in offset can i contribute to this or is there any other contribute in this issue :)

Anh-HoDac commented 3 months ago

Hi, I noticed that the Tick class is not yet updated, so I would like to work on it.

Abhinav00077 commented 3 months ago

hey there is the issue still open, i am a beginner contributor just need to know how to contribute to it, would be a great help

enesyesil commented 3 months ago

Hey, is the issue still open, I am a beginner. I will be happy to contribute to it. thanks

shriyase commented 3 months ago

Hey as a beginner I'd like to contribute!

Dr-Irv commented 3 months ago

@Abhinav00077 @enesyesil @shriyase we've had a lot of contributions to resolving this issue. What probably needs to be done at this point is to create a checklist from the list in this comment (open the "Details" at the bottom) https://github.com/pandas-dev/pandas/issues/52431#issuecomment-1499446218 as a new comment in this PR and indicate which of the classes have already had their docs fixed. Then we know which ones are left to do, and any of you can grab one of them to fix.

NavaneetthaSundararaj commented 1 month ago

@Dr-Irv here is a checklist from the list of comments based on https://github.com/pandas-dev/pandas/issues/52431#issuecomment-2183414515:

NavaneetthaSundararaj commented 1 month ago

I would like to work on this:

CustomBusinessMonthEnd CustomBusinessMonthBegin SemiMonthEnd SemiMonthBegin CustomBusinessDay

Siniade commented 1 month ago

Hi! I'm a beginner contributor and would like to work on this as well. I see that there are 3 classes remaining from the checklist above. I would like to work on these 3:

DateOffset Tick Frequencies

Siniade commented 1 month ago

take

Siniade commented 1 month ago

@Dr-Irv When using the offsets methods, Tick is not generally directly called upon. Instead the Tick object is initiated when we call its subclasses Hour, Minute, Second, Milli, etc. The Tick class is similar to the BaseOffset or the BusinessMixin classes that are not directly called upon but have subclasses through which they are initiated. In this case doesn't it make sense to not have a public documentation page for the Tick class?

Documentation problem

They are all missing a definition of the constructor, e.g., WeekOfMonth(n=..., normalize=..., week=..., weekday=...) should be at the top for the WeekOfMonth followed by a description of the parameters.

With respect to (2), if you look at (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.html), you will see at the top: class pandas.Index(data=None, dtype=None, copy=False, name=None, tupleize_cols=True)

Also regarding this issue, I noticed that the class pandas.Index is defined as a python class, however the pandas.tseries.offsets are defined as Cython cdef classes. Could that potentially be the reason it doesn't display the attributes at the top of the page?

Dr-Irv commented 1 month ago

@Dr-Irv When using the offsets methods, Tick is not generally directly called upon. Instead the Tick object is initiated when we call its subclasses Hour, Minute, Second, Milli, etc. The Tick class is similar to the BaseOffset or the BusinessMixin classes that are not directly called upon but have subclasses through which they are initiated. In this case doesn't it make sense to not have a public documentation page for the Tick class?

I'm not the one to make the decision here about whether we continue to document Tick but Tick is documented at https://pandas.pydata.org/pandas-docs/stable/reference/offset_frequency.html#tick and https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.Tick.html#pandas.tseries.offsets.Tick so given that it has been documented before, we should continue to document it and include the docstring for the constructor

Also regarding this issue, I noticed that the class pandas.Index is defined as a python class, however the pandas.tseries.offsets are defined as Cython cdef classes. Could that potentially be the reason it doesn't display the attributes at the top of the page?

Possibly. I'm not familiar with how the docs are built here - I just raised the issue!

Siniade commented 1 month ago

@Dr-Irv Got it. Thank you for clarifying! I'll update the Tick docstrings.