ropensci / EML

Ecological Metadata Language interface for R: synthesis and integration of heterogenous data
https://docs.ropensci.org/EML
Other
98 stars 33 forks source link

Fix mix-up of LFCR -> CRLF as default recordDelimiter in set_physical. #235

Closed amoeba closed 6 years ago

amoeba commented 6 years ago

I'm pretty sure I have this right and am not mixed up: The Windows CRLF is \r\n so I think this argument's default should be \\r\\n instead of `\n\r. Better yet, should another default be chosen or should we autodetect this?

cboettig commented 6 years ago

Can you roxygenize and update man file just to make travis happy?

Would be nice to have some unit test for this, currently we don't do anything with this field so it has no consequence. One would imagine wanting to automatically know what read. or read_ command to use in R based on this info.

Definitely open to autodetection and/or different default. (Also, would be nice to have templates for more than just csv, though I guess easy enough to specify manually).

amoeba commented 6 years ago

Can you roxygenize and update man file just to make travis happy?

Whoops. Done. Sorry about that.

currently we don't do anything with this field so it has no consequence.

Yes, this would be great. Ideally we'd be able to leverage another package for format detection though I'm not sure one exists.

Definitely open to autodetection and/or different default.

I actually noticed this when a co-worker mistakenly set \n\r on a file with UNIX line endings (double wrong). I think Windows line endings should be the default because EML is for science and science is still done primarily on Windows so it's default with the highest chance of being correct absent auto detection. Our team just happens to run Excel on macOS.

Auto-detecting line feeds isn't impossible. I noticed readLines appears to break on \r which catches DOS/WIndows, Linux, and macOS >10 line endings. I think a similar algorithm makes sense. I'll file an issue to add autodetection.

cboettig commented 6 years ago

Sounds good, anyway I should merge this so at least this is fixed.

Can you bump NEWS (so we have an easy record of the bugfix)?

On a moment's reflection, I don't think most parsers tend to actually ask you for the record delimiter anyway (i.e. there's no argument for this field in read.csv, I believe this is determined by setting the fileEncoding to unix vs DOS, or by default it just recodes into the current locale). I imagine that's what most parsers do as well? So I think defaulting to DOS line ending encoding but not worrying too much about it is reasonable.

codecov-io commented 6 years ago

Codecov Report

Merging #235 into master will decrease coverage by 0.01%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #235      +/-   ##
==========================================
- Coverage   68.47%   68.45%   -0.02%     
==========================================
  Files          25       25              
  Lines        5129     5129              
==========================================
- Hits         3512     3511       -1     
- Misses       1617     1618       +1
Impacted Files Coverage Δ
R/set_physical.R 100% <ø> (ø) :arrow_up:
R/eml_view.R 80.95% <0%> (-4.77%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b8e3fbb...f89d5fa. Read the comment docs.

amoeba commented 6 years ago

Thanks!

So I think defaulting to DOS line ending encoding but not worrying too much about it is reasonable.

I'm okay with this if you'd prefer it.