handling compression: filename vs magic numbers

smsaladi commented 6 years ago

biopython/biopython#1686 is discussing reading in compressed files. pandas handles this pretty nicely (never run into issues myself), but it looks like the code infers the compression from the file extension.

https://github.com/pandas-dev/pandas/blob/v0.23.1/pandas/io/common.py#L238

Why do this over sniffing for the magic number and inferring from there? If there was discussion about this in the past, could you point me to the relevant issue/email (I tried searching the issue tracker without luck)?

gfyoung commented 6 years ago

cc @jreback

WillAyd commented 6 years ago

Perhaps there's a history I am not aware of but at the very least the extensions seems like an easy way of doing this. If you think there's a better way of going about it then PRs are always welcome

pandas-dev / pandas

handling compression: filename vs magic numbers #21479