Would this work with inconsistently delimited files?

MostHated commented 4 years ago

Hey there, I have a folder full of files and unfortunately, the application that generates them, for some reason doesn't keep things extremely consistent. Examples will be below. From what I can tell, the first 3 lines are always comments, then the next section starting with HCONTEXT, there might be just one or there might be several. Then there are not always additional comments before you get to the sets of data, but in the second example, there are. The sets of data are always laid out the same with the first column being an application symbol, label, description, and then the last one is a list of 0 to N keys which are delimited by a space.

The main issue is the delimiter between the four columns are not consistent. Their layout of data is always the same, but to delimit the text, some might have a single tab (\t), some might have two, some might have three, or a single \t and a space (\s), two spaces and a tab (\s\t\s, or \s\s\t), etc.

I saw that this library was able to allow you to set the delimiter you want to search for and use, but does it have any capability to search for multiple types of delimiters, or allow for specific delimiters within a field? Example of that would be the 4th field for the key combinations, from what I have seen so far, each one is always delimited by a single space (\s) between them.

If you would not mind letting me know if this library is able to help me out with this, I would greatly appreciate it. If not, do you happen to know of one that might? I was not exactly sure what search terms to use when looking, I tried "parse text", "csv", "multiple delimiters", and various other things, but this library so far is the only one that looks like it might help. Unless I need to just go and use multiple libraries and do it in different steps, I am hoping to keep it as absolutely performant as possible though at runtime.

Thanks! -MH

//
// Desktop manager (separate app)
//

HCONTEXT deskmgr "Desktop Manager" "These keys are used in the Desktop Manager dialog."

deskmgr.new     "New"       "Create a new desktop"      Alt+N N
deskmgr.add     "Add"       "Add a desktop"         Alt+D D
deskmgr.apply       "Apply"     "Apply current changes"
deskmgr.accept      "Accept"    "Accept current changes"
deskmgr.discard     "Discard"   "Discard current changes"
deskmgr.reload      "Reload"    "Reload the desktops"
deskmgr.refresh     "Refresh"   "Refresh the desktops"
deskmgr.save        "Save"      "Save current changes"      Alt+S S
deskmgr.cancel      "Cancel"    "Cancel current changes"    Esc

//
// Gplay hotkeys
//

HCONTEXT gplay "GPLAY Geometry Viewer" "These keys apply to the Geometry Viewer application."

// File menu
gplay.open      "Open"          "Open"          Alt+O Ctrl+O
gplay.quit      "Quit"          "Quit"          Alt+Q Ctrl+Q

// Display menu
gplay.display_info  "Geometry Info"     "Geometry Info"     Alt+I
gplay.unpack        "Unpack Geometry"   "Unpack Geometry"   Alt+U
gplay.display_ssheet    "Geometry Spreadsheet"  "Geometry Speadsheet"   Alt+S
gplay.flipbook      "Flipbook Current Viewport" "Flipbook the currently selected viewport"  Alt+F
gplay.display_prefs "Preferences"       "Preferences"       

// Help menu
gplay.help_menu     "Help Menu"     "Help Menu"     Alt+H

// Commands not in menus
gplay.quick_quit    "Quick Quit"        "Quick Quit"        Q
gplay.next_geo      "Next Geometry"     "Next Geometry"     N
gplay.prev_geo      "Previous Geometry" "Previous Geometry" P
gplay.stop_play     "Stop Play"     "Stop Play"     Space

jszwec commented 4 years ago

Hello,

This library does not provide the CSV parser as it is described on the top of the readme.

Package csvutil provides fast and idiomatic mapping between CSV and Go (golang) values.

This package does not provide a CSV parser itself, it is based on the Reader and Writer interfaces which are implemented by eg. std Go (golang) csv package. This gives a possibility of choosing any other CSV writer or reader which may be more performant.

It only helps to quickly map data from CSV to structs and from structs to CSV using any parser you want.

The parser from the standard library can't use multiple different separators (look here) and it cannot ignore arbitrary lines or comments like you have.

In my opinion, your best bet is to implement a data normalizer that will get rid of comment lines and will normalize the separators. You can do it in a few different ways. You could just implement your own reader like this:

https://play.golang.org/p/pqrq7GC0Haj

This is far from being perfect, but it gives you an idea on how you can approach the problem and it actually parses your file correctly. I hope this helps!

MostHated commented 4 years ago

Oh, shoot, I am sorry about that. I scrolled down to the menu area and completely missed that part. Huge thanks for the example you created though, that is fantastic and definitely gets me pointed in the right direction. I really appreciate it.

jszwec commented 4 years ago

Im glad! I will close the issue, but feel free to reopen if you have any more questions.

MostHated commented 4 years ago

Hey there, I had one last question if you don't mind. Though, I am not entirely sure where the issue lies or if there is anything I can do about it. I am attempting to make an addon for an application which uses Python as an available scripting language, but Python is just too slow at handling the data I am trying to work with, which was why I was hoping to use Go instead for portions of it. I came across some examples in which Python was able to call Go compiled to C, which I was able to get working via the examples I was testing. I was hoping that I would be able to parse the files quickly and then send the result back to Python to use, but it seems that when attempting to compile the code in which you provided as an example to C and call it from Python it errors out.

Unfortunately, I am stuck having to use Python as the first layer of things since it is part of the main application, but I am hoping to find a way to use it as little as possible to get better performance overall in my addon. So really my main question is, do you happen to know offhand if there are any issues with trying to use this library as a cgo application? I am just wanting to try and figure out if I am attempting to head down a dead-end in the first place and should try to reach my end goal in a different manner.

jszwec commented 4 years ago

I am sorry but this is out of the scope of this package. All I can say is that this package has no external dependencies, it uses only the std lib so there should be no problem compiling a program that is using it.

I suggest that you ask on stackoverflow or something. You should provide much more information - like how you are exactly building your app etc. What you wrote is not enough.

jszwec commented 4 years ago

PS. You could just build a Go cmd tool binary and call it from python. I think it would be much simpler

MostHated commented 4 years ago

You are definitely right, that would be a great idea. Now that I think about it, might be a good idea to just have Go put the data into SQLite maybe, then I can just ask for it when I need it from Python.

Thanks again, I appreciate it.

jszwec / csvutil

Would this work with inconsistently delimited files? #25