How to skip lines starting with a certain char sequence, not just single char

jtablesaw / tablesaw

Java dataframe and visualization library

https://jtablesaw.github.io/tablesaw/

Apache License 2.0

3.56k stars 645 forks source link

How to skip lines starting with a certain char sequence, not just single char #1185

Open kurpav00 opened 1 year ago

kurpav00 commented 1 year ago

I have a tab-separated file and I would need to read it into a table while skipping all comment lines starting with '##'. While there seems to be an option to specify a comment prefix as a single character (commentPrefix), I would need to be able to specify a char sequence as a comment prefix. Is this possible? (Some of the data rows start with a single '#', so simply specifying '#' as the comment char is not possible; only lines starting with '##' must be skipped)

lwhite1 commented 1 year ago

No. there's no support for that. You would have to load the whole table and delete the ## rows, or do some pre-processing to change the input data.

On Wed, Feb 15, 2023 at 1:15 AM kurpav00 @.***> wrote:

I have a tab-separated file and I would need to read it into a table while skipping all comment lines starting with '##'. While there seems to be an option to specify a comment prefix as a single character (commentPrefix), I would need to be able to specify a char sequence as a comment prefix. Is this possible? (Some of the data rows start with a single '#', so simply specifying '#' as the comment char is not possible; only lines starting with '##' must be skipped)

— Reply to this email directly, view it on GitHub https://github.com/jtablesaw/tablesaw/issues/1185, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2FPAWT4KCDQ7MWAQUFFIDWXRYAXANCNFSM6AAAAAAU4NYCKA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

kurpav00 commented 1 year ago

OK, thank you for your reply. I wonder which of the methods would need to be modified in order to change this behavior. I could either simply override it in my code or change it directly in the source code. But I am struggling to see where this behavior is coded.

lwhite1 commented 1 year ago

I don't think it is implemented anywhere in tablesaw. In the CSV quasi-standard, a # starting a line is considered a comment marker. IIRC, this is handled by the CSV library we call.

On Wed, Feb 15, 2023 at 10:36 AM kurpav00 @.***> wrote:

OK, thank you for your reply. I wonder which of the methods would need to be modified in order to change this behavior. I could either simply override it in my code or change it directly in the source code. But I am struggling to see where this behavior is coded.

— Reply to this email directly, view it on GitHub https://github.com/jtablesaw/tablesaw/issues/1185#issuecomment-1431561346, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2FPAXUAKX46PU3WZM2KN3WXTZYFANCNFSM6AAAAAAU4NYCKA . You are receiving this because you commented.Message ID: @.***>

kurpav00 commented 1 year ago

OK, thanks. So file preprocessing is probably the only way to go.