Permissive settings by default?

attipaci commented 3 years ago

I'd like to start a discussion on a proposal, which is this: The library should be able to read FITS files as best as it can by default.

Or more specifically, I'd like the library to be able to read FITS out of the box, which may be:

using supported conventions (e.g. HIERARCH, as requested by @olebole ).
deviating from strict standard (e.g. keywords with non-standard ASCII characters beyond upper-case alphanumeric plus dash/underscore) but otherwise unambiguous in intent (i.e., the headers/data are properly described otherwise)
may be slightly corrupted (e.g. missing end-quote in string values, truncated file or file with terminal junk after)

All in all, I think developers want it to work out of the box, and deal with issues from other people's FITS files automatically. If and when they want to restrict support for common conventions, or want to enforce strict standard, they can still do that as needed.

Specifically, the change would be equivalent to setting:

setUseHierarch(true)
setLongStringsEnabled(true)
setAllowHeaderRepairs(true)
setAllowTerminalJunk(true)

And so the change would be that developers can disable these, instead of enabling them, as needed. Let me know if you think this may not be a good idea after all...

@olebole, @vforchi, @Zlika, @mbtaylor

vforchi commented 3 years ago

I don’t have a strong opinion on this, but I think that by default the library should fail if the file is corrupt and it should support the conventions.

attipaci commented 3 years ago

For comparison, Two other FITS tools I use a lot, fv (FITS Viewer), and ds9 (a FITS image viewer), both open corrupted FITS files by default.That is, they use whatever they can from the FITS. It would be interesting to know what CFITSIO, and PyFITS (astropy.io.fits) do. Do any of you know?

Taking @vforchi 's stance above, one could argue that what applies to fv or ds9 is not relevant, since these are compiled tools, which should be the most flexible.

The counter argument to that, is that if someone tries to read a crappy FITS file with our library, it's because they were handed it as is. It's not like they can go back to where it came from and say, "Hey, you gave me a crappy FITS. Fix it or die.". The only thing you can do with that crappy FITS is try to get whatever is inside it, even if you might not be able to recover all that was intended. At least you get something (and you will never know what was intended anyway...)

attipaci commented 3 years ago

BTW, I was just reading the latest FITS standard document (4.0), and the OGIP 1.0 long string convention is now part of the official standard. As such, we should definitely make long string support enabled by default...

olebole commented 3 years ago

For a library there are good reasons for both sides, so both seem reasonable. Therefore, I would recommend to keep the default as it is unless there is a really good reason to switch. Otherwise, every dependency would have to check (and probably adopt) this.

For example, in Debian tamfits is a separate package (libfits-java). Just updating this package would then result in unexpected results, f.e. when a dependency uses the defaults, but wants to be strict (or vice versa). At the end, it is a decision of the application developer what they want. The main purpose here is probably not to have shiny new Java applications, but to keep the library running for their dependencies.

Please, don't potentially break dependency unless there is a really good reason.

attipaci commented 3 years ago

@olebole, one good reason is that the latest FITS standard now specifies long strings as part of the standard (vs an optional convention before), but we currently have it disabled. At minimum we should be able to read/write standard FITS files. Now that means that we need to support long strings out of the box.

So, that alone is a potentially breaking change (as is allowing HIERARCH keywords out of the box, which is what you specifically requested for the next release!). If we do that -- as I think we must -- I'd advocate for the more liberal settings across the board. This way, whatever breaks downstream, has to be fixed once, rather than slow bleeding if we enable features one-by-one. Also, making things more permissive is unlikely to break anything in real life. It might break contrived test cases -- which test for a specific result, rather than a valid result (and I'd argue that in that case the downstream devs really need to fix what they are testing for...)

I know that some Debian packages may rely on nom-tam-fits in their upstream. As such, I propose that we tie the nom-tam-fits release schedule such as to allow Debian developers to properly build and test against the next lib release, and fix whatever needs fixing on their end. I would also support a freeze period for nom-tam-fits before the release (but perhaps excluding critical bug fixes that may be warranted by downstream).

If you let me know what schedule would work best for Debian, and what freeze period before that to allow the Debian developers to make adjustments and test things out, I'd be happy to conform to that....

PS. Please take a look at #165 and #169 also, which are further potentially breaking changes, that are nonetheless absolutely necessary to fix for scientific use.

olebole commented 3 years ago

I think this is not a Debian specific problem; I brought Debian in just as an example here. What I mean is the rule "Don't break compatibility (unless you need to)". The current default of the library is being strict with respect to the standard, and I think this should be kept. If the standard itself allows more things now, the default of nom-tam-fits should allow the same things by default. I would also think that supporting HIERARCH by default would be OK, as it is an official extension. I would rather not like to have header repairs or similar enabled by default, however. This would be a significant change in the behaviour which would give surprising results for current users of non-tam-fits (because they may think they can ensure strict compatibility by using the default setting). But at the end, this is not that important; and I feel my position here rather as "just my 2 ¢" instead of a strict opposition. For example, as far as I remember, astropy also changed the FITS file handling behavior from "strict to "try to fix" by default.

Zlika commented 3 years ago

I don't think breaking compatibility with older versions is a real problem if you also bump the major version number of the library. The main feature of this lib is just to read an image file. What would you expect from a function to read a jpeg or png file? Would you expect it to fail (by default) if the file is not strictly compliant with the standard, even if it could still be read? However, standard validation is also an interesting feature. Maybe this could be fulfilled by a dedicated function that would return a list of all the standard violations detected in the file?

attipaci commented 3 years ago

Thank you all for the input. I'll keep chewing it over, probably right up to the point when it's time to compose the release. I'll almost certainly enable longs string support (because it is standard), and HIERARCH support (because it is a widely accepted convention), but I'm on the fence about the others still. I'm somewhat leaning towards the 'try-to-fix' way (as you could probaly tell), but so far I have not seen any really strong arguments either for or against it...

As for the version bump, I think as long as the API stays backward compatible, a minor bump should suffice. The good old API will stay functional, and the changes in behavior will also be relatively subtle, especially for applications that already use it to compose valid FITS files, or that use the library to read reasonably valid FITS.

attipaci commented 3 years ago

PS. @Zlika, the validation I have in mind is mainly for headers you create from a program. For these, I think it's best to throw a runtime exception as soon as you are trying to do something that will result in an invalid or incomplete header. But, I like your idea of keeping track of violations when reading headers (and also when creating, if these violations are not showstoppers to throw an exception, but maybe worth taking note of). I don't think I' prioritize this for the upcoming release (because more thought needs to go into it), but we could slate it for the next release after that (e.g. 1.17)....

mbtaylor commented 3 years ago

Being naturally conservative I'm slightly nervous about changes that might modify existing behaviour; but on the whole I agree this change probably makes sense. One implementation issue: the relevant code in FitsFactory for setting useHierarch default currently comes with a comment I added for my internal use:

        // MBT (28-JUL-2017): change default from false to true.
        // This is required for HIERARCH-based wide fits processing
        // (see uk.ac.starlink.fits.WideFits).  If that gets backed out of,
        // this could be set back to its factory setting (false).
        private boolean useHierarch = true;

Whatever the decision on defaults, that comment is not relevant for nom-tam-fits general usage and should be removed.

attipaci commented 3 years ago

Thanks Mark. I'll take the comment out.

wcleveland commented 3 years ago

We shouldn't get so excited about supporting FITS-4 that we neglect compatibility with FITS-3. There should be a "legacy mode" that mimics the behavior of our last release. There are a lot of old software that we have to work with, and I seriously doubt the FITS-4 standard will be widely supported quickly.

We need to guarantee that the files we generate are still usable by existing software that are built with FITS-3 and less libraries.

attipaci commented 3 years ago

@wcleveland, I 100% agree with your sentiment, and my goal was/is to preserve support for earlier versions of the standard fully, exactly for the reason you mention. For example, before FITS 4, there was a requirement that ending quotes for string values must not come before byte 20 of the header record. FITS 4 did away with that requirement, but we stick with it anyway, because we want older software to be able to read FITS produced by us. None of the FITS 4 changes otherwise violate any earlier FITS standards. And the choice is yours whether you want to add complex values or hexadecimal integers in headers, which older software may not recognize. (However, we can at least parse them from headers that others produce and we need to handle...)

attipaci commented 3 years ago

PS. I mean to say that there is no "legacy mode". The standard behavior is "legacy mode", only with fixes to what was broken or ill-conceived, but no arbitrary changes otherwise.

attipaci commented 3 years ago

@wcleveland. Would you prefer adding a FitsFactory.setLegacyMode() method that sets FitsFactory settings to the same as those of 1.15? (And along with that there can be a FitsFactory.isLegacyMode() to check). It's certainly possible to do. However, I think for the general user of this library this may just be a confusing extra switch that means little if anything. So, for that reason, unless you think is absolutely necessary, I'd advocate on keeping things simple for users in general...

Rather, I'd think that if your application requires the extra special case of needing the exact same settings as 1.15 (my guess that would be <<1% of all use cases, and likely not even that), you can simply set the relevant FitsFactory settings to your delight one-by-one. That would be my preferred approach...

wcleveland commented 3 years ago

I believe the right strategy is require the user to "opt in" to use FITS 4 features, otherwise all the functions will adhere to FITS 3. Eventually we would release a version that would require the user to "opt out" of FITS 4 to maintain FITS 3 compatibility.

I will need to look at the code and think some more about this. Unfortunately, I'm working on a project that has a hard deadline that's coming up really soon so I don't know how quickly I will be able to come up with a more qualified answer.

attipaci commented 3 years ago

Again, I do not disagree. But, this is where we have been for the past 10 years already with opt-in support things like long strings. As you say, "Eventually we would release a version that would require the user to "opt out" of FITS 4 to maintain FITS 3 compatibility." I would argue now is as good a time as any for that eventuality.

Most of the FITS files (from a number of observatories and instruments) I have been working with in the past 10 years have been using the long string convention, and many have been using the HIERARCH style keyword convention also. In fact FITS4 really just codified what was already widely used in 2016. It's 2021 now. It's time to move beyond opt-in. I honestly find it a nuissance that I have to remember to enable those features in the library every time I write a tool....

Again, nothing that is added in FITS4 breaks FITS3 compatibility. Any FITS4 can be read as FITS3 just fine. In fact, the standard is careful to evolve in a strictly back compatible way. What FITS4 adds to headers is entirely legal as FITS3, it just has no special meaning in FITS3.

For example, long strings written as FITS4 will read back to a truncated string in FITS3, and the CONTINUE keywords are treated as comment-style keywords. So, the the enabling long string by default does not change much except adding a few legal comment-style cards from the point of FITS3. Compare that with our <=1.15 behavior with long strings disabled, in which case only the truncated part was added (without comment-style cards after). From the FITS3 point of view the result is essentially the same.

Same goes for HIERARCH keywords. These are just comment style cards as far as FITS3 is concerned. They just have no meaning when the convention is disabled. So we can safely write these by default, and a FITS3 reader will just treat them as comments. Compare that with the behavior of setUseHierarch() disabled by default (<=1.15), where these keys were simply not written to the header. The result is again essentially the same from a FITS3 point of view...

I'm quite confident that permissive settings is extremely unlikely to break any legacy code. That is not to say that 1.16 won't break existing applications, but that is far more likely to happen as a result of bugs that were fixed, and which legacy code may have worked around in creative ways...

attipaci commented 3 years ago

PS. Even if the changes break things unnecessarily (despite my significant efforts to the contrary), we'll still have ample opportunity to fix things up before the release, even if we miss something prior to merging. The earliest the next release could happen is December, but I think there is a good chance it will be pushed into 2022...

attipaci commented 3 years ago

And as a separate note, I also think we should not be too afraid of breaking stuff if it is for a good reason. This library has done it before, as do others (Python 3 anyone, lol? -- which did not make Python any less popular). I had to fix up some of my older code to work with newer releases of this library in the past (between minor releases, mind you). And that was fine, because the newer releases were generally an improvement over the old ones... So, yes, we should tread carefully, but we should not hesitate to make things better just because we might lose a sliver of 100% back compatibility...

wcleveland commented 3 years ago

You are comparing changes to a product that breaks the compatibility with previous versions of that product, to changes to a protocol that may break compatibility with other products that uses that protocol.

attipaci commented 3 years ago

I'm not sure what to make of the above.

I'm more worried that being ultra conservative leads to the fragmentation of this very library. If we cannot fix bugs (of which we have many) in the mainline for fear of breaking some code (hypothetically, because so far no-one has actually demonstrated that is in fact happens in a critical way with an application), then we end up with fragmentation. Everyone starts running their own spin of the package with their own fixes. The mainline is no longer a standard, only a dead-end trunk, with a different live fork of it for every application. I worry that this is where we are heading. We already have a number of divergent forks. I too have been using my own version of this library for years because the mainline was just too decrepit, and from the discussions (and commits on forks) I have seen I am not the only one. The only way we can turn things around is by improving the mainline where it leads rather than lags in doing what people expect it to do -- reading and writing FITS files well -- where it is good enough for most users/developers that it's no longer worth for them to keep their separate versions just to make it work for what they need it for.

That said, I have a strong preference for keeping back compatibility, but only if it's not in the way of much needed improvements...

attipaci commented 3 years ago

Anyway, before we get carried away about the philosophy behind maintaining this package, let's focus back on the original question of settings. Short of someone demonstrating a concrete example of some application breaking as a result of more permissive settings, and convincing me (and others) why that application cannot be changed to work with the permissive settings, I don't see a good reason for not enabling permissive settings by default. Talking about hypotheticals that may never happen does not measure up to the obvious benefits of being able to read (and write) more diverse FITS out of the box...

olebole commented 3 years ago

I would also recommend to get in contact with the Astropy fits package maintainer, @saimn. IMO it would be a good idea to have tamfits and astropy interoperable by default. Also, he may have some arguments here from the experiences with Astropy. For future development, it may be useful to keep coordination of the default settings, and also maybe to have some inter-language CI tests.

attipaci commented 3 years ago

That is a good idea, Ole. The cross-testing especially would likely uncover a myriad of bugs in both packages, but that's a bigger undertaking, and probably not for the upcoming release per se. But, I'll try to get in touch, and see if I can learn something or if we can strike up some level of coordination or collaboration...

attipaci commented 3 years ago

You can now find a pre-release (1.16.0-rc1) under the Releases section on the main page, with a precompiled (unsigned) JAR. Or, feel free to build your own from master.

Give it a go for your own testing and applications. From now until the 1.16.0 release I plan to only address fixing blocking issues. Depending on how the testing goes and how many issues need to be fixed, the official 1.16.0 release may come in December 2021 or January 2022 (and hopefully not much beyond). I made an effort to maintain back compatibility as far as possible. If something slipped up, let me know. (I'll start a new ticket for 1.16.0 pre-releases...)

I look forward to your feedback, and learning about any problems you might encounter. Thanks a lot for your help (past, present, and future)

-- A.

nom-tam-fits / nom-tam-fits

Permissive settings by default? #161