tmattio / spin

OCaml project generator.
ISC License
297 stars 15 forks source link

Preserve trailing newline characters at the end of template files when present #123

Closed mbarbin closed 2 years ago

mbarbin commented 2 years ago

The current behavior of spin new is to generate template files without newline characters at the end.

dune build @fmt and git status show a warning for files without trailing newline: \ No newline at end of file

This PR proposes to keep the trailing newline characters at the end of template files, when originally present.

I considered the alternative of systematically adding a trailing newline on every generated file, but decided against it because I thought that:

This PR introduces a new dependency in the stdio opam package. stdio only depends on base. I made a guess that adding this new dependency would be fine, since it does not seem to significantly grow the current set of transitive dependencies, simplifies spin_std.sys and comes with a performance improvement to read_all (although probably not noticeable on small files).

tmattio commented 2 years ago

Thanks @mbarbin 🙂

I'd like to keep Spin's dependency profile small if possible, so I'm not sure about adding a dependency on stdio (and transitively base). In particular, the dependency is only used in two functions, and I think the bug would be fixed by just replacing the write_file function by:

let write_file file content =
  let oc = open_out file in
  Fun.protect
    (fun () -> output_string oc content)
    ~finally:(fun () -> close_out oc)

Do you think you could change that? The PR looks good to me otherwise!

tmattio commented 2 years ago

Actually, reading the previous code, the bug was probably coming from the read_file function that reads files line by line and concatenates them afterward. This can be replaced by:

let read_file path =
  let ic = open_in path in
  Fun.protect
    (fun () ->
      let length = in_channel_length ic in
      let buffer = Bytes.create length in
      really_input ic buffer 0 length;
      Bytes.to_string buffer)
    ~finally:(fun () -> close_in ic)
mbarbin commented 2 years ago

Hi @tmattio,

Thank you for looking into it :slightly_smiling_face:

My bad - I wasn't sure whether spin.exe already linked base.cmxa and incorrectly thought it did.

base dependency

The base package is indeed a build dependency of spin (required by `parsexp`) but doesn't make it to the linked targets. ```bash spin$ dune clean && dune build --verbose 2>&1 | grep base.cmxa > /dev/null EXIT 1 (no match) ```

I reverted the new dep addition, and committed further based on your suggestions. While doing that, I had 2 minor questions, on which I'd like your input please (998f08d)

text mode vs binary mode

I wondered whether to open the files in binary mode rather than text mode. stdlib.mli has some comments about conversions happening in text mode than may influence the behavior of in_channel_length, whereas binary mode makes no conversion. Thus I currently favored the latest.

reading all at once vs reading in chunks

I've noticed Stdio.In_channel.input_all performs reading in chunks. I'm not sure why that is, rather than using really_input_string to read the whole file at once. If it's just to avoid the prior call to in_channel_length, then I don't have a concern with our 'read all at once' implementation. If there's another reason input_all is implemented the way it is, I may want to think about it.

Please let me know what you think. Thanks!

tmattio commented 2 years ago

I wondered whether to open the files in binary mode rather than text mode. stdlib.mli has some comments about conversions happening in text mode than may influence the behavior of in_channel_length, whereas binary mode makes no conversion. Thus I currently favored the latest.

That sounds sensible 🙂

I've noticed Stdio.In_channel.input_all performs reading in chunks. I'm not sure why that is, rather than using really_input_string to read the whole file at once. If it's just to avoid the prior call to in_channel_length, then I don't have a concern with our 'read all at once' implementation. If there's another reason input_all is implemented the way it is, I may want to think about it.

I think input_all works on all kinds of channels? in_channel_length only works on files, so you can use it for any channel.

The PR looks good to me, merging now, thanks a lot for your contribution 🎉

mbarbin commented 2 years ago

I think input_all works on all kinds of channels? in_channel_length only works on files, so you can use it for any channel.

Ah, yes - that's probably it.

The PR looks good to me, merging now, thanks a lot for your contribution

Great :slightly_smiling_face:. Thanks for spin!