Open mxmlnkn opened 1 day ago
Thanks for the report. I'm not familiar with these chained URLs, is there a formal specification for them? Some searching did not turn up anything for me.
Fsspec documents it here: https://filesystem-spec.readthedocs.io/en/latest/features.html#url-chaining The Pandas documentation mentions it at the end of this section: https://pandas.pydata.org/docs/user_guide/io.html#reading-writing-remote-files
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
For chained URLs, the file gets misidentified as TAR, which leads to this backtrace:
I have checked the source code, and the problem seems to be that the full URL is checked for ending with a TAR extension. Instead, only the last part in the chain should be checked, i.e., it should check the extension of
tar://test.csv
nottar://test.csv::file://test-csv.tar
.https://github.com/pandas-dev/pandas/blob/2a10e04a099d5f1633abcdfbb2dd9fdf09142f8d/pandas/io/common.py#L593-L594
Expected Behavior
It should work without an error.
Installed Versions