kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
3.96k stars 192 forks source link

Compiler distribution is MSDOS linefeed #851

Closed abitrolly closed 2 years ago

abitrolly commented 3 years ago

formats/ in kaitai-struct-compiler-0.9.zip contains repo with .git dir (which is okay, because it allows to send patches). And it is all botched, because files contain MSDOS linefeeds. Why?

dgelessus commented 3 years ago

formats/ in kaitai-struct-compiler-0.9.zip contains repo with .git dir (which is okay, because it allows to send patches).

The .git directory probably wasn't supposed to be included in the release. If you want to send PRs to kaitai_struct_formats, please make your own clone of the repo and don't use the one accidentally included in the release - otherwise you will be working on an outdated copy of the repo that is missing all changes from after the 0.9 release.

And it is all botched, because files contain MSDOS linefeeds. Why?

To be exact, some of the files use CRLF and others use LF. I'm guessing this was also by accident, because there's no good reason to use both line separator styles in a single set of files (just choose one style and use that everywhere).

But I'm not sure why you say it is "all botched" because of the CRLF line separators? Nowadays almost all software doesn't care if you use CRLF or LF, regardless of what platform you're on. (The main exceptions are Unix shell scripts, which require LF, and Windows batch scripts, which required CRLF until recent Windows 10 versions.) In particular, kaitai_struct_compiler can handle CRLF line separators on non-Windows systems (and probably also LF on Windows, I didn't check).

generalmimon commented 3 years ago

@abitrolly:

Why?

Because of my ignorance. Back then I was building the stable 0.9 universal .zip with the mysterious command sbt compilerJVM/universal:packageBin, I didn't really realize that it includes the ../formats folder from my machine (actually I didn't really know that the whole KSF gets packed into that ZIP at all - actually, I don't understand it even know). I'm sorry for that, not everything went great with the 0.9 release. Meanwhile, you can use 0.10-SNAPSHOT unstable builds, which don't have these problems (except for the image/nitf.ksy with CRLF line endings, which were fixed in a later commit than is currently checked out in the kaitai_struct repo as a submodule).

AFAIK it's not possible to replace released builds on Bintray once published. Yes, we'll need to eventually migrate off Bintray, which brings a temptation to modify the kaitai-struct-compiler-0.9.zip archive and fix it, but that's most likely a very bad idea (there must be a reason why the Semantic Versioning specification strictly [1] prohibits [2] that).

@dgelessus:

To be exact, some of the files use CRLF and others use LF. I'm guessing this was also by accident, because there's no good reason to use both line separator styles in a single set of files (just choose one style and use that everywhere).

Yeah, there's actually a big mess here, which reflects the state of my machine at the time of creating the release:

git ls-files --eol terminal output
```console pp@DESKTOP-MIPASSQ MINGW64 ~/Downloads/installers/kaitai-struct-compiler-0.9/formats (master) $ git ls-files --eol i/lf w/lf attr/ .editorconfig i/lf w/crlf attr/ .gitattributes i/lf w/lf attr/ .github/bintray.key i/lf w/lf attr/ .github/publish i/lf w/lf attr/ .github/workflows/main.yml i/lf w/crlf attr/ CONTRIBUTING.md i/lf w/crlf attr/ README.md i/crlf w/ attr/ _build/.gitignore i/lf w/ attr/ _build/Gemfile i/lf w/ attr/ _build/build i/lf w/ attr/ _build/build-diagrams i/lf w/ attr/ _build/build-html i/lf w/ attr/ _build/build-target i/lf w/ attr/ _build/check-errors i/lf w/ attr/ _build/footer.html.erb i/lf w/ attr/ _build/format_base.html.erb i/lf w/ attr/ _build/format_header.html.erb i/lf w/ attr/ _build/format_index.html.erb i/lf w/ attr/ _build/format_lang.html.erb i/lf w/ attr/ _build/format_xref.html.erb i/lf w/ attr/ _build/header.html.erb i/lf w/ attr/ _build/test-target i/crlf w/ attr/ _build/test/Main.cs i/lf w/ attr/ _build/test/main.cpp i/lf w/ attr/ _build/usage_cpp_stl.html.erb i/lf w/ attr/ _build/usage_csharp.html.erb i/lf w/ attr/ _build/usage_java.html.erb i/lf w/ attr/ _build/usage_javascript.html.erb i/lf w/ attr/ _build/usage_python.html.erb i/lf w/ attr/ _build/usage_ruby.html.erb i/lf w/lf attr/ archive/android_img.ksy i/lf w/crlf attr/ archive/cpio_old_le.ksy i/lf w/crlf attr/ archive/gzip.ksy i/lf w/crlf attr/ archive/lzh.ksy i/lf w/crlf attr/ archive/phar_without_stub.ksy i/lf w/lf attr/ archive/rar.ksy i/lf w/lf attr/ archive/zip.ksy i/lf w/crlf attr/ cad/monomakh_sapr_chg.ksy i/lf w/crlf attr/ common/bcd.ksy i/lf w/lf attr/ common/bytes_with_io.ksy i/lf w/lf attr/ common/riff.ksy i/lf w/lf attr/ common/utf8_string.ksy i/lf w/crlf attr/ common/vlq_base128_be.ksy i/lf w/crlf attr/ common/vlq_base128_le.ksy i/lf w/crlf attr/ database/dbf.ksy i/lf w/lf attr/ database/gettext_mo.ksy i/lf w/crlf attr/ database/sqlite3.ksy i/lf w/crlf attr/ database/tsm.ksy i/lf w/lf attr/ executable/dex.ksy i/lf w/crlf attr/ executable/dos_mz.ksy i/lf w/lf attr/ executable/elf.ksy i/lf w/crlf attr/ executable/java_class.ksy i/lf w/lf attr/ executable/mach_o.ksy i/lf w/crlf attr/ executable/microsoft_pe.ksy i/lf w/crlf attr/ executable/python_pyc_27.ksy i/lf w/crlf attr/ executable/swf.ksy i/lf w/lf attr/ executable/uefi_te.ksy i/lf w/crlf attr/ filesystem/apm_partition_table.ksy i/lf w/crlf attr/ filesystem/apple_single_double.ksy i/lf w/lf attr/ filesystem/btrfs_stream.ksy i/lf w/crlf attr/ filesystem/cramfs.ksy i/lf w/crlf attr/ filesystem/ext2.ksy i/lf w/crlf attr/ filesystem/gpt_partition_table.ksy i/lf w/crlf attr/ filesystem/iso9660.ksy i/lf w/crlf attr/ filesystem/luks.ksy i/lf w/crlf attr/ filesystem/lvm2.ksy i/lf w/crlf attr/ filesystem/mbr_partition_table.ksy i/lf w/crlf attr/ filesystem/tr_dos_image.ksy i/lf w/crlf attr/ filesystem/vdi.ksy i/lf w/crlf attr/ filesystem/vfat.ksy i/lf w/crlf attr/ filesystem/vmware_vmdk.ksy i/lf w/lf attr/ filesystem/zx_spectrum_tap.ksy i/lf w/crlf attr/ firmware/andes_firmware.ksy i/lf w/crlf attr/ firmware/ines.ksy i/lf w/crlf attr/ firmware/uimage.ksy i/lf w/crlf attr/ font/ttf.ksy i/lf w/crlf attr/ game/allegro_dat.ksy i/lf w/crlf attr/ game/doom_wad.ksy i/lf w/crlf attr/ game/dune_2_pak.ksy i/lf w/lf attr/ game/fallout2_dat.ksy i/lf w/crlf attr/ game/fallout_dat.ksy i/lf w/crlf attr/ game/ftl_dat.ksy i/lf w/crlf attr/ game/gran_turismo_vol.ksy i/lf w/crlf attr/ game/heaps_pak.ksy i/lf w/crlf attr/ game/heroes_of_might_and_magic_agg.ksy i/lf w/crlf attr/ game/heroes_of_might_and_magic_bmp.ksy i/lf w/crlf attr/ game/quake_mdl.ksy i/lf w/crlf attr/ game/quake_pak.ksy i/lf w/lf attr/ game/renderware_binary_stream.ksy i/lf w/crlf attr/ game/saints_row_2_vpp_pc.ksy i/lf w/crlf attr/ game/warcraft_2_pud.ksy i/lf w/crlf attr/ geospatial/shapefile_index.ksy i/lf w/crlf attr/ geospatial/shapefile_main.ksy i/lf w/crlf attr/ hardware/edid.ksy i/lf w/crlf attr/ hardware/mifare/mifare_classic.ksy i/lf w/lf attr/ image/bmp.ksy i/lf w/crlf attr/ image/dicom.ksy i/lf w/lf attr/ image/exif.ksy i/lf w/lf attr/ image/gif.ksy i/lf w/crlf attr/ image/icc_4.ksy i/lf w/crlf attr/ image/ico.ksy i/lf w/crlf attr/ image/jpeg.ksy i/crlf w/crlf attr/ image/nitf.ksy i/lf w/crlf attr/ image/pcx.ksy i/lf w/crlf attr/ image/pcx_dcx.ksy i/lf w/crlf attr/ image/png.ksy i/lf w/crlf attr/ image/psx_tim.ksy i/lf w/crlf attr/ image/tga.ksy i/lf w/lf attr/ image/wmf.ksy i/lf w/lf attr/ image/xwd.ksy i/lf w/crlf attr/ log/aix_utmp.ksy i/lf w/crlf attr/ log/glibc_utmp.ksy i/lf w/crlf attr/ log/hashcat_restore.ksy i/lf w/crlf attr/ log/sudoers_ts.ksy i/lf w/crlf attr/ log/systemd_journal.ksy i/lf w/lf attr/ log/windows_evt_log.ksy i/lf w/crlf attr/ machine_code/code_6502.ksy i/lf w/lf attr/ macos/compressed_resource.ksy i/lf w/crlf attr/ macos/ds_store.ksy i/lf w/lf attr/ macos/resource_compression/dcmp_0.ksy i/lf w/lf attr/ macos/resource_compression/dcmp_1.ksy i/lf w/lf attr/ macos/resource_compression/dcmp_2.ksy i/lf w/lf attr/ macos/resource_compression/dcmp_variable_length_integer.ksy i/lf w/lf attr/ macos/resource_fork.ksy i/lf w/lf attr/ media/android_opengl_shaders_cache.ksy i/lf w/crlf attr/ media/avi.ksy i/lf w/crlf attr/ media/blender_blend.ksy i/lf w/crlf attr/ media/creative_voice_file.ksy i/lf w/crlf attr/ media/genmidi_op2.ksy i/lf w/crlf attr/ media/id3v1_1.ksy i/lf w/crlf attr/ media/id3v2_3.ksy i/lf w/crlf attr/ media/id3v2_4.ksy i/lf w/lf attr/ media/magicavoxel_vox.ksy i/lf w/crlf attr/ media/ogg.ksy i/lf w/crlf attr/ media/quicktime_mov.ksy i/lf w/crlf attr/ media/standard_midi_file.ksy i/lf w/crlf attr/ media/stl.ksy i/lf w/crlf attr/ media/tracker_modules/fasttracker_xm_module.ksy i/lf w/crlf attr/ media/tracker_modules/s3m.ksy i/lf w/crlf attr/ media/vp8_ivf.ksy i/lf w/lf attr/ media/wav.ksy i/lf w/crlf attr/ network/bitcoin_transaction.ksy i/lf w/lf attr/ network/dns_packet.ksy i/lf w/lf attr/ network/ethernet_frame.ksy i/lf w/crlf attr/ network/hccap.ksy i/lf w/crlf attr/ network/hccapx.ksy i/lf w/crlf attr/ network/icmp_packet.ksy i/lf w/crlf attr/ network/ipv4_packet.ksy i/lf w/crlf attr/ network/ipv6_packet.ksy i/lf w/crlf attr/ network/microsoft_network_monitor_v2.ksy i/lf w/crlf attr/ network/packet_ppi.ksy i/lf w/crlf attr/ network/pcap.ksy i/lf w/crlf attr/ network/protocol_body.ksy i/lf w/crlf attr/ network/rtcp_payload.ksy i/lf w/crlf attr/ network/rtp_packet.ksy i/lf w/lf attr/ network/rtpdump.ksy i/lf w/lf attr/ network/some_ip/some_ip.ksy i/lf w/lf attr/ network/some_ip/some_ip_container.ksy i/lf w/lf attr/ network/some_ip/some_ip_sd.ksy i/lf w/lf attr/ network/some_ip/some_ip_sd_entries.ksy i/lf w/lf attr/ network/some_ip/some_ip_sd_options.ksy i/lf w/crlf attr/ network/tcp_segment.ksy i/lf w/crlf attr/ network/tls_client_hello.ksy i/lf w/lf attr/ network/udp_datagram.ksy i/lf w/crlf attr/ network/websocket.ksy i/lf w/crlf attr/ scientific/nt_mdt/nt_mdt.ksy i/lf w/crlf attr/ scientific/nt_mdt/nt_mdt_pal.ksy i/lf w/crlf attr/ scientific/spectroscopy/avantes_roh60.ksy i/lf w/crlf attr/ scientific/spectroscopy/specpr.ksy i/lf w/crlf attr/ security/efivar_signature_list.ksy i/lf w/crlf attr/ security/openpgp_message.ksy i/lf w/crlf attr/ security/ssh_public_key.ksy i/lf w/crlf attr/ serialization/asn1/asn1_der.ksy i/lf w/crlf attr/ serialization/bson.ksy i/lf w/crlf attr/ serialization/google_protobuf.ksy i/lf w/crlf attr/ serialization/microsoft_cfb.ksy i/lf w/crlf attr/ serialization/msgpack.ksy i/lf w/crlf attr/ serialization/php_serialized_value.ksy i/lf w/lf attr/ serialization/python_pickle.ksy i/lf w/crlf attr/ serialization/ruby_marshal.ksy i/lf w/crlf attr/ windows/regf.ksy i/lf w/crlf attr/ windows/windows_lnk_file.ksy i/lf w/lf attr/ windows/windows_minidump.ksy i/lf w/crlf attr/ windows/windows_resource_file.ksy i/lf w/crlf attr/ windows/windows_shell_items.ksy i/lf w/crlf attr/ windows/windows_systemtime.ksy ```

Actually, I'm surprised by the magnitude of chaos, and really have no clue how it could've happened (it must have something to do with my Git for Windows settings, and then perhaps some random conversions in VS Code upon opening - I don't know).

The .git directory probably wasn't supposed to be included in the release.

Yeah, it surely wasn't 😐 When I think about it now, I recall that there was something stupid I did on my local machine like deleting the formats/ folder and re-git clone-ing the latest version from GitHub in place of it. Again, this was a ignorance on my side of how Git submodules work - submodule is supposed to have a text file .git only containing gitdir: ../.git/modules/formats, which points to a location where the .git/ repository is actually stored.

Nowadays almost all software doesn't care if you use CRLF or LF, regardless of what platform you're on.

In particular, kaitai_struct_compiler can handle CRLF line separators on non-Windows systems (and probably also LF on Windows, I didn't check).

Yes, it certainly does.


In conclusion, good catch, @abitrolly, it's great that you notice and report such issues. However, we can't (or at least shouldn't) do anything about that (as far as the 0.9 release is concerned), and it isn't too serious anyway (both problems combined only cause a useless +2 MB size increase, which is unfortunate but not critical).

If there isn't anything left to answer, I'd close this issue as wontfix (we don't actually have such label, but you get what I mean).

abitrolly commented 3 years ago

But I'm not sure why you say it is "all botched" because of the CRLF line separators?

My mgitstatus complained that there are some changes in formats/, and when I run git status there, it all became red with diff showing that changes are linefeeds.

If you want to send PRs to kaitai_struct_formats, please make your own clone of the repo and don't use the one accidentally included in the release - otherwise you will be working on an outdated copy of the repo that is missing all changes from after the 0.9 release.

Why can't I just do git pull -r to update the copy?

If there isn't anything left to answer, I'd close this issue as wontfix (we don't actually have such label, but you get what I mean).

I'd expect 0.9.1, but it doesn't seem critical for me either.

dgelessus commented 3 years ago

which brings a temptation to modify the kaitai-struct-compiler-0.9.zip archive and fix it, but that's most likely a very bad idea

Yes, please never silently replace the download files for a release version. If you do that, every distro/package maintainer will hate you 🙂 and it's just confusing in general for people who pay attention to that sort of thing.

Actually, I'm surprised by the magnitude of chaos, and really have no clue how it could've happened

I would guess that it comes from a changing Git core.autocrlf setting. Most likely you first had it set to true (i. e. Git converts LF line endings to CRLF on checkout and vice versa on commit) and then later changed it later to input (i. e. Git checks out LF line endings unconverted, but still converts CRLF to LF on commit). That would explain why recently added/modified files have LF line separators and older ones have CRLF. Of course it's possible that your editor did something with the line separators as well.

In my experience, nowadays it's not really necessary anymore to have Git check out LF as CRLF on Windows, because almost everything can handle LF line separators without problems. I personally set core.autocrlf to input on Windows, because it doesn't modify anything unexpectedly on checkout, but still avoids committing CRLF line separators by accident. If you're sure that you'll never do that accidentally, you can also use false, so that Git does no silent conversions at all.

No matter what setting you use, you should also set core.safecrlf=true, so that you get an error if an automatic conversion is not losslessly reversible.

dgelessus commented 3 years ago

My mgitstatus complained that there are some changes in formats/, and when I run git status there, it all became red with diff showing that changes are linefeeds.

Well, the .git directory is only there by accident anyway, so I wouldn't expect it to behave properly 🙂

Why can't I just do git pull -r to update the copy?

  1. The origin remote is set to @generalmimon's fork, so it's likely to be outdated compared to the main repository. Of course I don't know how he does it, but I personally don't update my forks' default branches very often.
  2. The remotes are configured using SSH, which you can only use if you have a GitHub account with a properly configured SSH key pair. So if the machine you're pulling from doesn't have your key pair configured, or you don't have a key pair set up with GitHub, or you simply don't have a GitHub account, you're unable to pull in this clone.
abitrolly commented 3 years ago

Well, then the shipped .git dir should point to the right master then. :D A shallow clone can save some space.

I could probably fix the release process, but it uses some Scala tools I am not familiar with.

generalmimon commented 3 years ago

@abitrolly Anything left to answer or can I close the issue?

abitrolly commented 3 years ago

@generalmimon if the release process is fixed then this can be closed.

generalmimon commented 2 years ago

formats/ in kaitai-struct-compiler-0.9.zip contains repo with .git dir (which is okay, because it allows to send patches).

The .git directory probably wasn't supposed to be included in the release.

Strangely enough, there was already an attempt in build.sbt:77-91 to filter out the .git dir back then (but it ended up in the resulting .zip anyway, as we know), so I wonder why it didn't work:

    // Universal: add extra files (formats repo) for distribution, removing
    // .git special files and various dirty/backup files that git normally
    // ignores.
    //
    // NOTE: This requires formats repo to be checked out on the level higher
    // that the compiler
    mappings in Universal ++= NativePackagerHelper.directory("../formats").filterNot {
      case (_, dst) =>
        val dstFile = new File(dst)
        val dstFileName = dstFile.getName
        dst.startsWith(s"formats${File.separator}_") ||
          dstFileName == ".git" ||
          dstFileName.endsWith("~") ||
          dstFileName.endsWith("#")
    },
generalmimon commented 2 years ago

to filter out the .git dir back then (but it ended up in the resulting .zip anyway, as we know), so I wonder why it didn't work

Wait - in fact, it only accounts for the plain text .git file, because the .git/ directory isn't even supposed to be there. It was there because... ☝️ https://github.com/kaitai-io/kaitai_struct/issues/851#issuecomment-785064839:

When I think about it now, I recall that there was something stupid I did on my local machine like deleting the formats/ folder and re-git clone-ing the latest version from GitHub in place of it. Again, this was a ignorance on my side of how Git submodules work - submodule is supposed to have a text file .git only containing gitdir: ../.git/modules/formats, which points to a location where the .git/ repository is actually stored.