photopea / UPNG.js

Fast and advanced PNG (APNG) decoder and encoder (lossy / lossless)
MIT License
2.1k stars 259 forks source link

Throw an exception when not decoding #10

Closed jonathanlurie closed 7 years ago

jonathanlurie commented 7 years ago

When trying to decode a file that is not a png, decode will remain in an infinite loop due to while(true) and the fact that var len = bin.readUint(data, offset) gives NaN. Then, the method continues to read the file, adding 4 to the offset but actually adding 4 to NaN, which gives NaN, and so on. The quick fix is to throw an exception whenever the len is NaN because we know it would then be trapped.

photopea commented 7 years ago

Hi Jonathan. This is a library for decoding PNG images. It is not a library for deciding, if some piece of data is a PNG image. If we go this way (making the code stupid-proof), we may end up with a messy library, which would be two times larger.

It is enough to check the first 8 bytes to recognize PNG (they should be 89 50 4e 47 0d 0a 1a 0a). So just do it before using UPNG.js

jonathanlurie commented 7 years ago

Well, testing the first-bit sequence is definitely the right way to, but a while(true) with no exit way is a bit dangerous. Anyway, I think I got your point.

photopea commented 7 years ago

I used whle(true) twice in UPNG.js :) thre is always some "break", which eventually occurs.

We could make the function UPNG.isPNG(arrayBuffer), that returns true/false (by checking first 8 bytes) . But still, I think, that in practice, you already know what data you have before parsing them.

By the way, Photopea can load about 15 image formats, so I check the first bytes in advance to detect the format, and only then I choose the right parser.

mn4367 commented 7 years ago

It is enough to check the first 8 bytes to recognize PNG (they should be 89 50 4e 47 0d 0a 1a 0a). So just do it before using UPNG.js

It isn't. I just tested this with a PNG where I deleted a few hundred bytes in the middle of the file, so the header is OK. It ended up in an infinite loop logging unknown chunk type NaN. I also tried to load it with Photopea, same result, the browser tab saturized a CPU core and thus was unresponsive and had to be killed*. The file can be found here.

I used whle(true) twice in UPNG.js :) thre is always some "break", which eventually occurs.

No offense meant but this is very dangerous. Anyone using while (true) absolutely has to prove unequivocally that break; will eventually happen, no matter what the code is running into. In this case that obviously doesn't work.

It is not a library for deciding, if some piece of data is a PNG image. If we go this way (making the code stupid-proof), we may end up with a messy library, which would be two times larger.

Suppose this would be common sense. It could result in a situation where a database server fed with an invalid SQL statement would end up in an infinite loop. Nobody would accept that. And I'm quite sure that UPNG.js will be used in unattended server systems. In my opinion any library (really any) which processes data from outside has to be fail safe in a sense that no matter what data it is given it has to handle it properly and thus be stupid-proof.

In this case it doesn't mean that you have to check for any garbage in any data which probably would blow up the library, I'd just go the opposite way, that is tighten the boundaries of what is accepted as PNG and fail gracefully by throwing a simple error if the data doesn't fulfill the specs, just like @jonathanlurie proposed.

You can't expect that anybody using the library parses and checks the input data before passing it to your code. This unnecessarily increases the amount of code, the user of library checks the input and you check it again. And you are the PNG expert, who else could handle this task better than you?

The customer I'm working for has a large testing department. One of the things the testers always do is to feed invalid data into all of the interfaces. If the system doesn't handle these cases (and it's really enough to just say "invalid data") the product will never be released.


*Off topic: feeding garbage into Photopea often leads to a situation where the loading banner appears and stays there forever. Nothing further happens, no error message is given, only the console logs something. I think in those cases an error message should be presented to the user stating that the given file couldn't be loaded.

photopea commented 7 years ago

@mn4367 There are quite a lot of places, at which a PNG file could be incorrect. Detecting all of them would increase the size of UPNG.js about twice. And if we want to have really precise reports about the problem in a PNG file, it can increase even 10 times. It is not like adding one condition could detect all corrupted files. That is why I don't want to go this way.

I don't think it is right to compare it with SQL. SQL commands are often created by people and the probability of error is high. Meanwhile, PNG images are usually created by a specific software, which has to guarantee the correct structure of the output files.

Again, UPNG.js guarantees the correct behavior just for valid PNG files, there are no warranties about invalid PNG files. The user of UPNG.js has to ensure, that only valid PNGs will be supplied to it.

mn4367 commented 7 years ago

There are quite a lot of places, at which a PNG file could be incorrect. Detecting all of them would increase the size of UPNG.js about twice.

And if we want to have really precise reports about the problem in a PNG file, it can increase even 10 times.

Exactly this is not what I was proposing, no precise reporting needed, just fail gracefully with an error, always. Do you really want to leave an infinite loop unfixed because adding one line of code is inappropriate? This is a clear bug isn't it?

The user of UPNG.js has to ensure, that only valid PNGs will be supplied to it.

That's absolutely unrealistic, see the behavior of Photopea. You want people to use it but check PNGs beforehand with, say Gimp, to make sure that Photopea doesn't break when feeding the file into it?

photopea commented 7 years ago

Threre exist files, which you can give to Gimp or Photoshop, and make them end up in an infinite loop. I experienced it many times. It is not the bug in the software, but the input files were incorrect.

"leave infinite loop unfixed" - the loop is finite on a correct data. There does not exist any simple "fix". The solution proposed in this pull request solves 1 case of 1000.

Nobody has to check any PNGs. All PNG files, that occur in the world and can be opened with standard PNG viewers, can be also opened with Photopea. If you somehow create an invalid PNG file, it will crash any PNG viewer, not just UPNG an Photopea.

mn4367 commented 7 years ago

Threre exist files, which you can give to Gimp or Photoshop, and make them end up in an infinite loop. I experienced it many times. It is not the bug in the softwer, but the input files were incorrect.

Then Photoshop and Gimp should fix that too. If you file a bug report there, I think they'd fix that.

An addition to SQL commands. The vast majority of SQL is generated by software like Hibernate. Or a different example: any network stack will handle broken packages gracefully by throwing an error which can be catched by the consumer.

If you somehow create an invalid PNG file, it will crash any PNG viewer, not just UPNG an Photopea.

And that's fine! Exactly what @jonathanlurie (and I) was proposing. Throw an error, but do not end up in an infinite loop.

And no, not all of them crash nor do they get unresponsive, they show garbage, but that's OK. In fact, I couldn't find any software on my machine that crashed when loading this file.

bildschirmfoto 2017-08-09 um 18 25 47

bildschirmfoto 2017-08-09 um 18 27 30

Photoline bildschirmfoto 2017-08-09 um 18 30 31 Translating to "the document couldn't be opened"

photopea commented 7 years ago

If you want, make a pull request and I will merge it.

But again, I think it is a waste of time. Corrupted PNGs (that have first 8 bytes of PNG, but invalid in the rest) don't occur in a real world and the error will never be thrown.

Note, that this issue was originally about a completely different problem - the author supplied a file, which didn't even have the right first 8 bytes (it could be a valid JPG file etc).

mn4367 commented 7 years ago

Hmm, I can do that, but I really don't want to force you to merge something you are not comfortable with (I also have another addition I'd like to incorporate).

I could offer an alternative way: I would create a fork and implement the change in the fork. But since I need the UPNG.js node module in another module this would mean in the end that I have to publish my fork as a separate node module so ultimately there would exist two UPNG.js node modules at npmjs.com. Regardless of the MIT license which allows that, would that be OK for you?

photopea commented 7 years ago

Sure, no problem. You can start a separate branch and go "your own way".

But still, I think it would be better, if we cooperated on a single project. I can not say if I will or will not accept your code, because I don't know in which way you want to "fix" it. I would have to see your patch first.

mn4367 commented 7 years ago

But still, I think it would be better, if we cooperated on a single project.

Of course, that's why I fought for this ;-)

OK, many thanks. For the loop I'd just use what @jonathanlurie proposed. Maybe it's also worth to check the first 8 bytes as you proposed.

The addition I'd like to have is the ability to provide my own logging function since I want to handle the messages myself. So I was thinking about something like

UPNG.setLogger = function(logger) {
  UPNG.Logger = logger;
}

function log() {
  if (UPNG.Logger) {
    UPNG.Logger.apply(UPNG, arguments);
  } else if (typeof process=="undefined" || process.env.NODE_ENV=="development") {
    console.log.apply(console, arguments);
  }
}
photopea commented 7 years ago

The solution of jonathanlurie and checking 8 bytes is not enough to detect non-png files. So add it in your own branch. The log() also is never called for correct PNG files, it was useful only to me during the development. I will probably remove it completely in the future.

I would be rather, if you looked at my library as a black box and reported problems in functionality (incorrect output). Arguing over the style of code (if while or for loop should be used, a command that is never reached ...) makes no sense. If you make a separate branch, but the only difference would be a style of code, it also makes no sense to me.

mn4367 commented 7 years ago

OK, I'll do that, thanks.

The log() also is never called for correct PNG files

Sure? I have a valid (I think) PNG file here and UPNG.js does log unknown chunk type sBIT 3. sBIT is listed as a valid chunk type (amongst others which are not handled, like tIME, zTXt).

photopea commented 7 years ago

Oh, you are right, not all chunk types are implemented. That is the only useful occurance of log :)

mn4367 commented 7 years ago

The solution of jonathanlurie and checking 8 bytes is not enough to detect non-png files.

Sure, but if the file doesn't have these bytes I'd simply reject it by throwing an error. One potential source of problems less.

I would be rather, if you looked at my library as a black box and reported problems in functionality (incorrect output).

I'm not sure if I got this right, are you suggesting that I shouldn't look at the code? If you don't want people look at your code, why publish it? I didn't base my decision to use UPNG.js on your code, I chose it because of the clear focus, the compact size and the absence of huge dependency trees.

Arguing over the style of code (if while or for loop should be used, a command that is never reached ...) makes no sense.

I have absolutely no problem with your coding style, I also use while (true), but only if I can guarantee that it doesn't end up in an infinite loop. But seen from a different perspective, opening the source to others almost always leads to discussions about the source code (you may call it style) and the good thing about it is that in most cases the code gets better (see #7 and #8). You could consider it a kind of code review and code reviews are always a good thing. This is really the opposite of "makes no sense".

If you make a separate branch, but the only difference would be a style of code, it also makes no sense to me.

I wouldn't change the style of the code. I'd fix a problem. This does make sense. Especially if the fix is so easy like @jonathanlurie proposed.

The solution proposed in this pull request solves 1 case of 1000.

I'm not sure what you mean by this. Because there are too many potential problems lets do not fix even one? Nobody demands that all 1000 have to be fixed. But clearly 999 problems are better than 1000.

Corrupted PNGs (that have first 8 bytes of PNG, but invalid in the rest) don't occur in a real world and the error will never be thrown.

You must be kidding. Files do get corrupt due to hard disk errors, network problems, hardware problems, faulty implementations etc on a daily basis. And some files are broken by intention just to exploit the latest security holes. If the average user gets an email with a PNG attached he'll probably try to open it in an image viewer. If the viewer tells him that the file is broken then it's OK. If the viewer displays garbage that's OK. But getting no answer is a problem.

Look, the reason why I'm so strict with this special case is only because it's an infinite loop. An infinite loop in a library is probably the only problem in software development which cannot be handled by the consumer of a library. It's like the perfect trap. And it is too easy to just demand correct files, see above.

mn4367 commented 7 years ago

Something completely different, are you interested in adding a TypeScript definition file? This helps TypeScript developers to use your library. You can look at the changes here.

photopea commented 7 years ago

I do realize, that files may get corrupted sometimes. But these corruptions can and should be handled on "lower layers" (e.g. by using TCP instead of UDP, by using a better storage technology / RAID etc.). Such technologies exist and today they are everywhere.

I would rather wait, until somebody actually has corrupted files, which need to be at least partially restored. When you create such files on purpose, it is not fair, because even if UPNG.js had 100 000 lines of code, there will still exist files, which would seem to be handled "incorrectly" (whatever it means). I have never seen a corrupted file in my life (I have only generated some during the programming).

E.g. if the PNG file said, that the width and height is 51 361 px, UPNG would try to allocate 10 GB of RAM, which is usually not available. There is no way to tell, if the value 51 361 was in the original, or it occured due to corruption. Arguing about handling such cases would consume a lot of time and I don't want to go in this direction. Anybody can always make some "strange file" on purpose and there would be no end.