pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.62k stars 17.91k forks source link

handling compression: filename vs magic numbers #21479

Open smsaladi opened 6 years ago

smsaladi commented 6 years ago

biopython/biopython#1686 is discussing reading in compressed files. pandas handles this pretty nicely (never run into issues myself), but it looks like the code infers the compression from the file extension.

https://github.com/pandas-dev/pandas/blob/v0.23.1/pandas/io/common.py#L238

Why do this over sniffing for the magic number and inferring from there? If there was discussion about this in the past, could you point me to the relevant issue/email (I tried searching the issue tracker without luck)?

gfyoung commented 6 years ago

cc @jreback

WillAyd commented 6 years ago

Perhaps there's a history I am not aware of but at the very least the extensions seems like an easy way of doing this. If you think there's a better way of going about it then PRs are always welcome