poseidon-framework / poseidon-hs

A toolset to work with modular genotype databases in the Poseidon format
https://poseidon-framework.github.io/#/trident
MIT License
7 stars 2 forks source link

trident validate should check LF-line endings #311

Open stschiff opened 1 month ago

stschiff commented 1 month ago

We recently came across an issue where I accidentally committed a Janno-file that had Windows line endings (CRLF) (see https://github.com/poseidon-framework/community-archive/pull/214). Checksums were computed based on these files and committed, and the package validated. We only caught this because git converted the line endings implicitly into LF, which in turn changed the file and invalidated the checksum.

Whether or not we divide to change git-attributes (see https://github.com/poseidon-framework/community-archive/issues/215), we should build in a check within trident validate that checks for line endings in all text-files that are checksum-validated, including .ind, .fam, .snp, .bim, .vcf (in the future), .geno, .bib, .janno.

nevrome commented 1 month ago

This is probably the same issue as #302.

In today's discussion we concluded that such a check is not necessary in trident, but instead only in the public archives. Maybe that's what we try for now. Requiring one specific line ending is a change we should first encode in the Poseidon schema.

stschiff commented 1 month ago

OK, we could still discuss a warning. I think even without mentioning this in the schema, we can make it "recommended" to use LF.