shshemi / tabiew

A lightweight TUI application to view and query tabular data files, such as CSV, TSV, or parquet.
MIT License
489 stars 12 forks source link

[ BUG ] running with separator ¶ #7

Closed greyHairChooseLife closed 3 months ago

greyHairChooseLife commented 4 months ago

Hi, there.

$ tw --separator '¶' ./my.csv $ tw --infer-schema safe --separator '¶' ./my.csv $ tw --infer-schema no --separator '¶' ./my.csv

These returns all same.

Error: ComputeError(ErrString("could not parse `1009880005252�` as dtype `str` at column '�' (column number 1)\n\nThe current offset in the file is 154 bytes.\n\nYou might want to try:\n- increasing `infer_schema_length` (e.g. `infer_schema_length=10000`),\n- specifying correct dtype with the `dtypes` argument\n- setting `ignore_errors` to `True`,\n- adding `1009880005252�` to the `null_values` list.\n\nOriginal error: ```invalid utf-8 sequence```"))

It says invalid utf-8 but it is valid utf-8. Any clue, please?

Regards

shshemi commented 4 months ago

Hi,

Thank you for reporting this. The problem is that the underlying CSV library (Polars) assumes that separator and quote characters are ASCII. Therefore, the Pilcrow character is cast into u8 and turned into an invalid character. A more user-friendly message will be shown in the next version. Let me know if I can help with anything else.

Bests