Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
The documentation for read_csv's nrows argument says:
Number of rows of file to read. Useful for reading pieces of large files.
I want to read a file using header=1, and then limit the number of rows. The documentation says this counts the number of rows of the file. To me that sounds like it includes the skipped row and the column header row, since pandas still reads those rows from the file. But I've done some testing. The nrows argument counts the number of data rows. It excludes the skipped rows, and excludes the column header row. skiprows is the same (skipped rows aren't counted towards nrows). When I have a row which is a comment, that also doesn't count towards nrows.
import pandas as pd
csv = """extra,
a,b
1,1
#comment,comment
2,2
3,3
footer,blah,yeah
"""
from io import StringIO
with StringIO(csv) as io:
df = pd.read_csv(io, header=1, nrows=2, comment='#')
For nrows=2, it seems to always return 2 rows.
Suggested fix for documentation
Number of rows of data to read. Useful for reading pieces of large files. Refers to the number of included data rows. The following rows are not included in the count:
the column header
rows before the column header, if header=1 or larger
Pandas version checks
main
hereLocation of the documentation
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Documentation problem
The documentation for
read_csv
'snrows
argument says:I want to read a file using
header=1
, and then limit the number of rows. The documentation says this counts the number of rows of the file. To me that sounds like it includes the skipped row and the column header row, since pandas still reads those rows from the file. But I've done some testing. Thenrows
argument counts the number of data rows. It excludes the skipped rows, and excludes the column header row.skiprows
is the same (skipped rows aren't counted towardsnrows
). When I have a row which is a comment, that also doesn't count towardsnrows
.For
nrows=2
, it seems to always return 2 rows.Suggested fix for documentation