Closed khughitt closed 3 years ago
Hmm. Just realized, in a couple cases, this is occurring for the data as well (i.e. corrupt/incomplete files previously downloaded are being reused..)
The gzip file is clearly truncated ("gzip: unexpected end of file"), and getGEO()
emits a warning when it encounters the last line with fewer columns than expected, e.g.:
Warning: 1 parsing failure.
row col expected actual file
6104 -- 1039 columns 123 columns
Since it's possible that a truncation could occur in between lines, however, it is probably better to check with gzip instead of relying on column differences.
I haven't noticed checksum files anywhere on the GEO ftp, but, if such things exist, that would be another option to ensure file integrity.
I realized a simple check for exceptions/non-zero status codes should do the trick.. submitted a PR.
Recently, I noticed some significant changes in the output from a pipeline. I traced it back to the GPL files retrieved by
geoGEO()
. At some point, I ended up with a bunch of partially-downloaded GPL files, which getGEO was re-using.It should be easy to check for these and re-download them / refuse to work with invalid files (a check for
!platform_table_end
should be sufficient for most cases).I don't have the time right now to tackle this, but I'll try and come back to it in the future when I have some free time. Wanted to report the issue now so that others are aware in the meantime.