Open BdR76 opened 2 months ago
I am ok with this idea, but I think this can be accomplished in multiple ways without affecting practicality, so I'm not sure if this is necessary.
As for the alternative idea, I'm not really a fan of it, that dict format is not very readable in my opinion, and with just a few lines of code you can convert it to the appropriate dict format.
Feature Type
[ ] Adding new functionality to pandas
[X] Changing existing functionality in pandas
[ ] Removing existing functionality in pandas
Problem Description
The
read_csv
has a parameterdate_format
which can can be "str or dict of columns", see documentation.So for parsing date columns you can either use:
However, in practise a csv file with different datetime formats, usually it's just either a date or a datetime (or time) format. In other words the date format can differ a lot for different csv files, but usually it doesn't differ that much within one file. Theoretically there could be US and European date formats mixed in one csv file, but I work with a lot of csv data and I've never seen this. From my expericence this is a very uncommon use-case.
Feature Description
So for example, a csv file can have 10 date columns formatted like
01-05-2024
and 5 columns formatted like05-05-2024 12:30
. Reading such a csv file withread_csv
with many datetime columns, just thestr
parameter is not sufficient but thedict
parameter is a bit overkill because you have to explicitly set the format for each column when basically there are just two groups, so it's not very practical.So my feature request is:
Can the read_csv be updated so that the
date_format
parameter also accepts just a list of dateformat strings for the date columns? So for exampledate_format=['%d-%m-%Y', '%d-%m-%Y %H:%M:%S']
Alternative Solutions
Alternatively, I think it could be practical for most typical use-cases to give groups of dateformats. So instead of having to supply a parameter for each individual column, like this:
It could be changed so you have to supply groups like this, which is less code and more reflecting the actual situation:
Additional Context
See code examples below for typical csv files with date values (it is all randomly generated test data)