onetrueawk / awk

One true awk
Other
1.99k stars 160 forks source link

Unicode separated values (USV) don't work #193

Closed bakul closed 1 year ago

bakul commented 1 year ago

From See https://github.com/sixarm/usv, in USV Fields are separated by ␟ = U+241F = Symbol for Unit Separator & Records are separated by ␞ = U+241E = Symbol for Record Separator

From https://news.ycombinator.com/item?id=31360327

 $ cat t.usv && echo
  id␟name␟age␞1␟Bob "Billy" Smith␟42␞2␟Jane
  Brown␟37
  $ goawk -F␟ -vRS=␞ -vOFS=, '{ print $1, $2, $3 }' t.usv 
  id,name,age
  1,Bob "Billy" Smith,42
  2,Jane
  Brown,37

This works in goawk, gawk & mawk but not awk. The USV values are kind of hard to see

plan9 commented 1 year ago

indeed. USV is not supported in OTA.

bakul commented 1 year ago

Shouldn't the user be allowed to pick any regexp as a field and any char/string as a record separator? Now that awk is extended to Unicode, I don't see why the above shouldn't be possible.

arnoldrobbins commented 1 year ago

I did some experimentation, and there's a general problem here, using Unicode characters as RS and apparently as FS. Something broke sometime, since I had done some (minimal) testing using Unicode as RS. The code is somewhat fragile, unfortunately. I am reopening this issue, but I don't know when it will be solved.

plan9 commented 1 year ago

this has been fixed - thank you @arnoldrobbins

arnoldrobbins commented 1 year ago

@benhoyt Please update your forum post that this issue is fixed.

benhoyt commented 1 year ago

@arnoldrobbins Unfortunately one can't edit or even reply to HN comments after a certain amount of time, and for that forum post that time has elapsed. Thanks for the fix though!

bakul commented 1 year ago

Thank you! On Oct 5, 2023, at 11:49 AM, Ben Hoyt @.***> wrote: @arnoldrobbins Unfortunately one can't edit or even reply to HN comments after a certain amount of time, and for that forum post that time has elapsed. Thanks for the fix though!

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>