r-devel / r-project-sprint-2023

Material for the R project sprint
https://contributor.r-project.org/r-project-sprint-2023/
17 stars 3 forks source link

improve efficiency of read.csv and write.csv #41

Open hturner opened 1 year ago

hturner commented 1 year ago

Discussed in https://github.com/r-devel/r-project-sprint-2023/discussions/7

Originally posted by **tdhock** July 1, 2023 Hi! I will not be attending the sprint, but I had a couple of ideas related to improving efficiency of read.csv and write.csv. Probably the more important issue to address would be read.csv, which had time complexity quadratic in number of columns, see this issue for some empirical analysis: https://github.com/tdhock/atime/issues/8 Another issue was that write.csv uses linear memory, whereas other CSV writers use only constant memory (this is not that big of an issue though, because anyways you need linear memory to store the data in R before writing to CSV) https://github.com/tdhock/atime/issues/10 @gmbecker @bastistician may be able to help mentor? They worked on fixing a similar efficiency issue https://github.com/tdhock/atime/issues/9