Open stek29 opened 6 years ago
Ouutsh - This is getting tricky here, since ´mkskel.sh´ envokes the shell (incl. the internal field separator), sed
and m4
with all their supposed EOL treatment (inherited from the compilation?), where the input file could follow the OS's EOL standard or it has converted EOLs (which might happen by, e.g., cloning with git).
@westes I should admit that this is a little bit beyond my knowledge of sed
and company, in particular when it comes to OS cross-overs, since I am sitting in front of a Windows box using coreutils shipped by cygwin or msys which is always a bit of a stretch when it comes to a consistent EOL treatment.
Probably using perl at least makes sense, or doing tr '\r' '\n'
before sed
No to perl.
The tr command is probably wrong in the general case but may be ok in inputs we care about.
On Thursday, 14 December 2017, 10:48 pm +0000, Viktor Oreshkin notifications@github.com wrote:
Probably using perl at least makes sense, or doing
tr '\r' '\n'
before sed-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/westes/flex/issues/294#issuecomment-351859895
-- Will Estes westes575@gmail.com
Yeah it is kind of a mess, unfortunately.
On Thursday, 14 December 2017, 10:43 pm +0000, jannick0 notifications@github.com wrote:
Ouutsh - This is getting tricky here, since ´mkskel.sh´ envokes the shell (incl. the internal field separator),
sed
andm4
with all their supposed EOL treatment (inherited from the compilation?), where the input file could follow the OS's EOL standard or it has converted EOLs (which might happen by, e.g., cloning with git).@westes I should admit that this is a little bit beyond my knowledge of
sed
and company, in particular when it comes to OS cross-overs, since I am sitting in front of a Windows box using coreutils shipped by cygwin or msys which is always a bit of a stretch when it comes to a consistent EOL treatment.-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/westes/flex/issues/294#issuecomment-351858885
-- Will Estes westes575@gmail.com
What about something like sed ':a;N;$!ba;s/(\r\n|\r)/\n/g'
using sed
address ranges to normalize EOLs at some stage(s) of mkskel.sh
?
You'd also have to remember what the original state of the file is so that you can write it back in the way the caller expects, I think.
On Friday, 15 December 2017, 7:14 am -0800, jannick0 notifications@github.com wrote:
What about something like
sed ':a;N;$!ba;s/(\r\n|\r)/\n/g'
usingsed
address ranges to normalize EOLs at some stage(s) ofmkskel.sh
?-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/westes/flex/issues/294#issuecomment-352030029
-- Will Estes westes575@gmail.com
Umm - then what about using gawk
to remember the EOL structure of flex.skl
as is?
# mkskel.awk
# sample call: gawk -f ./mskel.awk flex.skl > skel1.c
BEGIN{
oRS = RS
RS = "\f" # or '\v'; any character which is rare or even not contained in the input stream / file
# such that gawk slurps the input stream ideally in one single step
lines = ""
dbg = 0
#dbg = 1
}
{
lines = lines == "" ? $0 : lines RS $0
c++
}
END{
if ( dbg )
print "input stream read in " c " step(s)" > "/dev/stderr"
if ( lines == "" )
{
print "no lines from input file / stream read" > "/dev/stderr"
exit 1
}
# compose string of char array skel
# where input lines are concatenated with original EOLs
s = "/* File created from flex.skl via mkskel.sh */" oRS oRS
s = s "#include \"flexdef.h\"" oRS oRS
s = s "const char *skel[] = {" oRS
# aEOL non-POSIX
n = split(lines, aLine, "\r\n|\r|\n", aEOL )
for ( i = 1; i <= n; i++)
s = s "\t\"" aLine[i] "\"," ( i < n ? aEOL[i] : "" )
s = s oRS "\t0" oRS "};"
print s
}
We can't assume it's GNU awk.
But if some fairly generic awk will do that, then I'm open to it.
And even some linux distributions have some pretty abominable excuses calling themselves "awk", so it's not just a BSD/OSX thing.
On Saturday, 16 December 2017, 12:28 am +0000, jannick0 notifications@github.com wrote:
Umm - then what about using
gawk
to remember the EOL structure offlex.skl
as is?# mkskel.awk # sample call: gawk -f ./mskel.awk flex.skl > skel1.c BEGIN{ oRS = RS RS = "\f" # or '\v'; any character which is rare or even not contained in the input stream / file # such that gawk slurps the input stream ideally in one single step lines = "" dbg = 0 #dbg = 1 } { lines = lines == "" ? $0 : lines RS $0 c++ } END{ if ( dbg ) print "input stream read in " c " step(s)" > "/dev/stderr" if ( lines == "" ) { print "no lines from input file / stream read" > "/dev/stderr" exit 1 } # compose string of char array skel # where input lines are concatenated with original EOLs s = "/* File created from flex.skl via mkskel.sh */" oRS oRS s = s "#include \"flexdef.h\"" oRS oRS s = s "const char *skel[] = {" oRS # aEOL non-POSIX n = split(lines, aLine, "\r\n|\r|\n", aEOL ) for ( i = 1; i <= n; i++) s = s "\t\"" aLine[i] "\"," ( i < n ? aEOL[i] : "" ) s = s oRS "\t0" oRS "};" print s }
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/westes/flex/issues/294#issuecomment-352144524
-- Will Estes westes575@gmail.com
Ok, in this package mkskel.zip I tried to put together a POSIX compliant awk script which should do the trick that output EOL are identical to either input file EOL unless given on the awk
command line.
Additional notes:
mkskle.sh
, thus it could make m4
obsolete for the preprocessing step. For this the only m4preproc
define M4_GEN_PREFIX
is migrated to a awk
function. VERSION number mandatory on the awk
command line.gawk --posix
(or gawk -P
)mkskel.sh
and mkskel.awk
after running against flex.skl
. Here I see the additional header line with the date stamp and quotation issues in c-comments, thus effectively no differences with impact on flex
codeflex.skl
as it stands right now. TODO
s in the script indicate where code could be removed or amended if corresponding changes in flex.skl
were applied; this could shrink the code quite a bit I would expect.awk
used which I think is not important here, since c
compilers do not care about the nasty EOL issue I would hope.@westes ... and as always please do feel free to amend as you might find appropriate. But I hope that helps.
Thanks. I'll have a look. Most likely after 2.6.5 is released which is next on my flex todo list, but we'll see how things go.
On Sunday, 17 December 2017, 8:00 am -0800, jannick0 notifications@github.com wrote:
Ok, in this package mkskel.zip I tried to put together a POSIX compliant awk script which should do the trick that output EOL are identical to either input file EOL unless given on the
awk
command line.Additional notes:
- EOL consistency check for input file (if EOL not provided on command line, i.e. from outside of the script)
- the awk script could replace
mkskle.sh
, thus it could makem4
obsolete for the preprocessing step. For this the onlym4preproc
defineM4_GEN_PREFIX
is migrated to aawk
function. VERSION number mandatory on theawk
command line.- POSIX compliance checked with
gawk --posix
(orgawk -P
)- the package contains a makefile to check any differences between the output of
mkskel.sh
andmkskel.awk
after running againstflex.skl
. Here I see the additional header line with the date stamp and quotation issues in c-comments, thus effectively no differences with impact onflex
code- the current version of the script process
flex.skl
as it stands right now.TODO
s in the script indicate where code could be removed or amended if corresponding changes inflex.skl
were applied; this could shrink the code quite a bit I would expect.- the output file type is governed by the version of
awk
used which I think is not important here, since c compilers do not care about the nasty EOL issue I would hope.@westes ... and as always please do feel free to amend as you might find appropriate. But I hope that helps.
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/westes/flex/issues/294#issuecomment-352265659
-- Will Estes westes575@gmail.com
Excuse me, but what was this issue about? Was it only about [^\r]
incompatibility or was it something more? I think the fix should be easy—no need to bother with awk
or perl
. As I experimented with sed syntax when working with PR #321, I think I can take this one.
But here's one thing I need to know first: Which EOL (end of line) convention are we expecting for flex.skl
? LF only, CR+LF, or CR, or do we accept all three?
In theory we accept any line termination at all.
In practice, flex is built in an ubuntu container (although at some point i'll get the build to run in osx container as well because travis offers that feature). The *BSD folks who are also contributors to flex use standard LF line termination.
On Monday, 23 April 2018, 6:15 pm -0700, "Kang-Che Sung (宋岡哲)" notifications@github.com wrote:
Excuse me, but what was this issue about? Was it only about
[^\r]
incompatibility or was it something more? I think the fix should be easy—no need to bother withawk
orperl
. As I experimented with sed syntax when working with PR #321, I think I can take this one.But here's one thing I need to know first: Which EOL (end of line) convention are we expecting for
flex.skl
? LF only, CR+LF, or CR, or do we accept all three?-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/westes/flex/issues/294#issuecomment-383771465
-- Will Estes westes575@gmail.com
Maybe this issue #539
Since 3f2b9a4 it only works with GNU sed.
On Apple sed (BSD too?) it breaks lines at
r
, producing invalid C file.