microsoft / vscode-data-wrangler

Other
430 stars 19 forks source link

Support different field and record delimiter #189

Open elau1004 opened 4 months ago

elau1004 commented 4 months ago

This ticket is a feature request.

There are other delimiter used in text files other than the TAB and NEWLINE. For example, in the ASCII standard there are four dedicated delimiters. x1F - UNIT SEPARATOR x1E - RECORD SEPARATOR

In the EDI X12 standard, generally the following are used as their separators: x2A - Asterik as the unit separator. x7E - Tilde as the record separator.

X12 separator are negotiated between both the sender and receiver but the defacto standard is the above.

It would be good to support different field an record separator other than the current default.

kycutler commented 4 months ago

Hi @elau1004, thanks for opening this issue! Data Wrangler does support specifying the unit separator after initial load, by selecting the first history step and modifying the "Delimiter" argument:

image

Backslash-escape sequences such as \x1F are also supported here.

However, you are correct that the record separator currently cannot be specified as an argument and so will always use the pandas default \n. We do plan to support more options like this in the future, so I will keep this issue open to track that progress and provide any updates here.

Also note that you can always modify the code directly, such as to add the pandas lineterminator="..." argument:

image

Hope this helps! Thanks for trying Data Wrangler!