UTF8 BOM vs P4C - Githubissues

Not running the preprocessor causes p4test not support files which have an UTF8 BOM on it. I know the P4 language spec says the source is written in ASCII but ASCII is a subset of UTF8 so I had expected this to work. The only place where you might run into difference between ASCII and UTF8 is inside string literals which already is mentioned is passed without any change.

The reason why this works with the preprocessor is that both GCC and clang will output preprocessed sources files without the BOM. So it just works.

apinski@xeond:~/src/p4/octeontxkpu$ ../p4c/build/p4test --nocpp  ut8-bom.p4
ut8-bom.p4(0):syntax error, unexpected UNEXPECTED_TOKEN
�
^
[--Werror=overlimit] error: 1 errors encountered, aborting compilation
apinski@xeond:~/src/p4/octeontxkpu$ !od
od -c ut8-bom.p4
0000000 357 273 277  \n   #   i   n   c   l   u   d   e       <   c   o
0000020   r   e   .   p   4   >  \n  \n   /   /       {       d   g   -
0000040   w   a   r   n   i   n   g       "   m   a   i   n   "       "
0000060   "       {       t   a   r   g   e   t       *   -   *   -   *
0000100       }       0       }  \n
0000107

p4lang / p4c

UTF8 BOM vs P4C #3837