Closed TendouArisu closed 6 months ago
Hi @TendouArisu thank you very much for noticing that. This repos is only used to update the CO2 dataset, which happens a few times per year. Most of our processing is done in a different repository, called ETL, where performance is more important (and where we currently use pandas 2.2.1). We may upgrade pandas here too at some point.
Issue Description:
Hello. I have discovered a performance degradation in the
.concat
function of pandas version 1.5.2. And I notice the repository depends on pandas 1.5.2 inscripts/requirements.txt
. I am not sure whether this performance problem in pandas will affect this repository. I found some discussions on pandas GitHub related to this issue, including #50652 and #52685. I also found thatscripts/make_dataset.py
used the influenced api. There may be more files using the influenced api.Suggestion
I would recommend considering an upgrade to a different version of pandas >= 2.1 or exploring other solutions to optimize the performance of
.concat
. Any other workarounds or solutions would be greatly appreciated. Thank you!