python / cpython

The Python programming language
https://www.python.org
Other
63.18k stars 30.25k forks source link

Error in the documentation for csv.reader #113046

Closed pscabot closed 10 months ago

pscabot commented 10 months ago

Documentation

In the description for csv.reader function it says:

Return a reader object which will iterate over lines in the given csvfile. csvfile can be any object which supports the iterator protocol and returns a string each time its __next__() method is called

(The bold emphasis is mine)

However it returns a list as it's properly documented at csvreader.__next()__

Linked PRs

ericvsmith commented 10 months ago

The bold text refers to the first argument to csv.reader, which is named csvfile. The documentation you refer to is for the return value of csv.reader.

willingc commented 10 months ago

@pscabot Thanks for submitting the docs issue. As @ericvsmith mentions, the existing documentation accurately reflects the implementation.

I recommend taking a look at https://peps.python.org/pep-0305/ as called out in the "see also" box. This should give some additional clarity.

I'm going to go ahead and close this issue. Please feel free to ask for additional clarification of the csv.reader operation if needed.

terryjreedy commented 10 months ago

I am reopening because the second sentence is a bit confusing and I think it should be replaced, as suggested below. In full, it says

csvfile can be any object which supports the iterator protocol and returns a string each time its __next__() method is called — file objects and list objects are both suitable.

The sentence says that csvfile 'can' ('should', 'must'?) be an iterator of strings, with a __next__ method, but a list csvfile is only an iterable and does not have a __next__ method. This makes it possible to think that 'its' refers back to the returned iterator, which always does have a __next__ method.

I think the replacement should be properly say what 'csvfile' must be, something like:

The csvfile must be an iterable of strings, each with the needed format. File and list objects are the most commonly used.

ericvsmith commented 10 months ago

I agree the documentation could be improved. @pscabot : Does Terry's change above make sense to you?

pscabot commented 10 months ago

@ericvsmith I believe the confusing part (at least what confused me) is the juxtaposition of the return value and the requirements for csvfile. I would use @terryjreedy change but also add a new line and a reference:

Return a reader object which will iterate over lines in the given csvfile.  See csvreader.__next__().

The csvfile must be an iterable of strings, each with the needed format. File and list objects are the most commonly used.
ericvsmith commented 10 months ago

I don't think we should link to csvreader.__next__(), but rather to https://docs.python.org/3/library/csv.html#reader-objects. And I think the link can just be from "reader object" in your first sentence, and delete the second sentence. So (cleaned up a little):

Return a reader object which will iterate over lines in the given csvfile. csvfile must be an iterable of strings, each with the needed format. File and list objects are the most commonly used.

pscabot commented 10 months ago

I would still recommend a new line. All other changes are great.

willingc commented 10 months ago

Thanks @terryjreedy for the additions. :heart:

jeremy-dolan commented 10 months ago

I don't think the sentence is actually ambiguous as "its" couldn't grammatically refer to the "reader object" in the previous sentence. Regardless, maybe this is less wordy and clearer:

CHANGE: "csvfile can be any object which supports the iterator protocol and returns a string each time its __next() method is called"... TO: "csvfile can be any iterable that returns a string each time its __next() method is called"...

ericvsmith commented 10 months ago

I think it's confusing to mention __next__, since no one ever directly calls it. The point is that it's an iterable returning csv-formatted strings.

A slight tweak:

Return a reader object which will process lines from the given csvfile. csvfile must be an iterable of strings, each with the needed format. csvfile is most commonly a file-like object or list.

Although "with the needed format" seems a little wishy-washy. Maybe "in csv format"?

terryjreedy commented 10 months ago

I agree that the link is sufficient, and that part of my proposal was a wishy-washy place-holder. How about "each in the reader's defined csv format" or "each in the csv format defined by the other arguments"? My point here is that there is no one concrete 'csv format', but a family of csv formats.

ericvsmith commented 10 months ago

I like "each in the reader's defined csv format".

terryjreedy commented 10 months ago

Making a PR now. Thanks for the discussion here. I far prefer hashing out short doc patches on the issue, without bot noise, than on the PR itself.

terryjreedy commented 10 months ago

@jeremy-dolan As I said in https://github.com/python/cpython/issues/113046#issuecomment-1854693628, An iterables does not have a __next__ method unless it is also an iterator, and a list of strings, one of two suggested types for csvfile, is not an iterator.