Closed Esword618 closed 1 week ago
afaik this is not possible with polars currently because the separator must be a single character.
what you are looking for is the equivalent of pandas read_fwf
to read "fixed-width-formatted" data (https://pandas.pydata.org/docs/reference/api/pandas.read_fwf.html)
there are a few issues already but it is not yet supported.
In pandas, I use the read_csv
function of pandas and then use the period parameter sep='\s+'
to split the data.
df = pd.read_csv(filename, header=None, skiprows=6, sep='\s+')
yeah, this also works but as I said currently polars does not support regex or string separator but only a single char.
there are workarounds but they are not very nice 😆
DATA = """\
11.50225 34.62792 341.48861 60.23845 33.86916 340.52216
16.08011 46.36068 112.74108 82.09562 45.90745 112.68871
5.44448 64.20202 84.74526 92.26079 63.48149 84.83877
154.21007 40.30874 284.20968 248.08102 40.32464 284.05453
44.78606 81.08370 306.90320 207.53215 80.58101 307.01056
187.79354 52.18742 348.14328 254.43741 52.35809 348.16040
3.19632 58.35471 336.89014 83.53841 59.67276 335.88022
4.53459 54.00255 23.75481 66.02106 51.58699 23.86702
"""
pl.read_csv(DATA.encode(), has_header=False, new_columns=["data"]).with_columns(
pl.col("data")
.str.strip_chars(" ")
.str.replace_all(" +", " ")
.str.split(" ")
.list.to_struct()
).unnest(columns="data").with_columns(pl.all().cast(pl.Float64))
shape: (8, 6)
┌───────────┬──────────┬───────────┬───────────┬──────────┬───────────┐
│ field_0 ┆ field_1 ┆ field_2 ┆ field_3 ┆ field_4 ┆ field_5 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═══════════╪══════════╪═══════════╪═══════════╪══════════╪═══════════╡
│ 11.50225 ┆ 34.62792 ┆ 341.48861 ┆ 60.23845 ┆ 33.86916 ┆ 340.52216 │
│ 16.08011 ┆ 46.36068 ┆ 112.74108 ┆ 82.09562 ┆ 45.90745 ┆ 112.68871 │
│ 5.44448 ┆ 64.20202 ┆ 84.74526 ┆ 92.26079 ┆ 63.48149 ┆ 84.83877 │
│ 154.21007 ┆ 40.30874 ┆ 284.20968 ┆ 248.08102 ┆ 40.32464 ┆ 284.05453 │
│ 44.78606 ┆ 81.0837 ┆ 306.9032 ┆ 207.53215 ┆ 80.58101 ┆ 307.01056 │
│ 187.79354 ┆ 52.18742 ┆ 348.14328 ┆ 254.43741 ┆ 52.35809 ┆ 348.1604 │
│ 3.19632 ┆ 58.35471 ┆ 336.89014 ┆ 83.53841 ┆ 59.67276 ┆ 335.88022 │
│ 4.53459 ┆ 54.00255 ┆ 23.75481 ┆ 66.02106 ┆ 51.58699 ┆ 23.86702 │
└───────────┴──────────┴───────────┴───────────┴──────────┴───────────┘
However, best way if the file is not huge is probably to read the data, replace all \s+
with ',' and then read_csv
the "clean" csv using polars
no, because the implementation of the separator
param behaviour in the read_csv
method only accept a single byte character.
As answered above: this is not possible.
Description
Here is the content of my data: