thephpleague / csv

CSV data manipulation made easy in PHP
https://csv.thephpleague.com
MIT License
3.34k stars 335 forks source link

Calls to rewind or fseek that fail are ignored silently #502

Closed driskell closed 1 year ago

driskell commented 1 year ago

Bug Report

Information Description
Version 9.8.0
PHP version 8.0
OS Platform MacBook M1 MacOS Sonoma

Summary

It seems some custom wrapper streams in PHP report as seekable even though they are not (fseek returns false) - I am beginning to wonder that PHP does not support a custom wrapper reporting itself as not seekable. So the metadata flag for seekable is not enough.

However, rewind and fseek do have return values. But code is ignoring them. So the failure to rewind is completely ignored. This causes corrupt loads. This can happen even with a stream that reports seekable if that stream encounters an error

Standalone code, or other way to reproduce the problem

I need some time to produce an example. But it seems AWS S3 stream wrapper is an example here. Corrupt data when no "seekable" set.

Expected result

Error at least

Actual result

No error

Checks before submitting

nyamsprod commented 1 year ago

@driskell thanks for submitting your bug report. From what I understand the stream wrapper you are using seems to incorrectly report its stream capabilities. If I am correct how do you expect the CSV package to behave 🤔 . If a strean wrapper does not follow the specification I don't see how the package could prevent the issue you are experiencing.

Also I see you are referring to AWS stream capabilities. Did you look into their documentations to see if you can not change it's behaviour via some specific settings 🤔

If I recall some of AWS stream are not seekable by default, you need to activate the bahviour according to their docs.

driskell commented 1 year ago

@nyamsprod It does seem this is less a case of “stream reporting seekability incorrectly” but actually a case of “PHP does not support seekability flag for a custom stream”. There is actually no way for a custom stream wrapper to report that it is unseekable.

I think the key thing here though is still the fact that there are several calls to file system functions without any check on the error response. If these indeed point to remote filesystems then you’ll have an issue. For example even if a stream is seekable and you seek forward and the network connection drops it will fail. Likewise if you have a seek backwards and the stream is using ephemeral storage to cache and the data is lost it will fail too (of course these are hypothetical - but that’s why error checking is necessary)

Would be great to include error checking as it will save developer time on investigation

driskell commented 1 year ago

It would be great if the library could however support unseekable streams. It seems there is no reason to seek for any purpose. Even BOM can just be stripped after read, rather than stripping before a read. Any rewind seeks can also be checked - if you’re at position 0 already no need to seek during rewind. Would greatly improve things as the library could then manage seekable streams effortlessly

nyamsprod commented 1 year ago

@driskell I added the following fix:

https://github.com/thephpleague/csv/commit/78e70d9a51e016e1a07387ea2389483b396e565e

I beleive that's the only thing I can do to help with the issue.

The fix will be release in the next patch version or minor version.

driskell commented 1 year ago

@driskell I added the following fix:

78e70d9

  • to the internal Stream object it will throw a RuntimeException if the rewind action fails
  • if fseek fails (returns -1 ) a new RuntimeException will be thrown too.

I beleive that's the only thing I can do to help with the issue.

The fix will be release in the next patch version or minor version.

Thanks that would be perfect! It will definitely help to raise problems with incompatible streams sooner - thank you and really appreciate your time looking after this library 👍