ropensci / pangaear

R client for the Pangaea database
https://docs.ropensci.org/pangaear
Other
21 stars 10 forks source link

read_meta fails with large multi event data sets #82

Open lukasjonkers opened 2 years ago

lukasjonkers commented 2 years ago

For data sets with many events (or with many variables) the metadata block at the top of the file is larger than 1,000 lines (e.g. doi 10.1594/PANGAEA.61061). Because only the first 1,000 lines are read in read_meta function (in zzz.R):

lns <- readLines(x, n = 1000)
ln_no <- grep("\\*/", lns)

does not yield a value as */ only occurs at the end of the metadata block and

all_lns <- seq_len(ln_no)

fails.

I guess the easiest solution would be to increase or remove the limit on the number of lines that are read. Removing the limit entirely is perhaps impractical with very large datasets, but simply increasing it doesn't guarantee that the issue never occurs. Perhaps add a loop to sequentially increase n until */ is found? Something like

nlines <- 1000
lns <- readLines(x, n = nlines)
ln_no <- grep("\\*/", lns)
while(length(ln_no) == 0){
    nlines <- nlines + 1000
    lns <- readLines(x, n = nlines)
    ln_no <- grep("\\*/", lns)
  }

read_csv also in zzz.R has the same issue.