nickmckay / LiPD-utilities

Input/output and manipulation utilities for LiPD files in Matlab, R and Python
http://nickmckay.github.io/LiPD-utilities/
GNU General Public License v2.0
29 stars 9 forks source link

guess_max = 1e6 causing massive performance issue #57

Closed andrewdolman closed 3 years ago

andrewdolman commented 4 years ago

There is a massive performance issue using readLipd in R caused by the argument ",guess_max = 1e6" in the call to read_csv in function read_csv_from_file.

Using the example file from http://nickmckay.github.io/LiPD-utilities/r/index.html, which is only 1.3 MB, uses over 7 GB of RAM.

Setting this to 1e3 fixes the problem.

I would make a pull request but I don't know enough about LiPD to know whether guessing from the first 1000 entries is likely to be good enough.

library(lipdR)

pth <- normalizePath("ODP1098B13.lpd")

dat <- readLipd(pth)
nickmckay commented 4 years ago

Thanks,

This was changed during troubleshooting, because in some (rare) cases, 1000 isn't enough. However, it shouldn't stay at 1e6 because of the overhead. I'll look into better solutions.

On Fri, Feb 21, 2020 at 9:05 AM Andrew Dolman notifications@github.com wrote:

There is a massive performance issue using readLipd in R caused by the argument ",guess_max = 1e6" in the call to read_csv in function read_csv_from_file.

Using the example file from http://nickmckay.github.io/LiPD-utilities/r/index.html, which is only 1.3 MB, uses over 7 GB of RAM.

Setting this to 1e3 fixes the problem.

I would make a pull request but I don't know enough about LiPD to know whether guessing from the first 1000 entries is likely to be good enough.

library(lipdR)

pth <- normalizePath("ODP1098B13.lpd")

dat <- readLipd(pth)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nickmckay/LiPD-utilities/issues/57?email_source=notifications&email_token=ACXPOZ3ERJJ6L2IHIYDUYMLRD73UPA5CNFSM4KZF3IWKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPK5IZA, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXPOZ4YEJTAZNS6M4ICB3DRD73UPANCNFSM4KZF3IWA .