libyal / libesedb

Library and tools to access the Extensible Storage Engine (ESE) Database File (EDB) format.
GNU Lesser General Public License v3.0
341 stars 91 forks source link

esedbexport: add command-line option for vertical output #66

Closed jengelh closed 1 year ago

jengelh commented 1 year ago

Similar what mysql -E does, add an option to esedbexport to print columns in "vertical" fashion.

joachimmetz commented 1 year ago

Thanks for the proposed changed, but can you change them to match the style of codebase?

Also how do you plan to handle records (or cells) that contain a lot of data (long strings)?

jengelh commented 1 year ago

It's an unusual style. It reads as if someone is typing on chat and trying to win a contest to exaggerate the LOC count. Anyway, here's a +74% bigger patch that I think conforms.

handle cells that contain a lot of data (long strings)?

Unhandled and not a goal of this particular patch. mysql -E does not specially wrap cell data either, which I think that is acceptable:

joachimmetz commented 1 year ago
It's an unusual style. It reads as if
someone
is typing
on chat
and
trying to win a contest
to exaggerate the LOC count. Anyway, here's a +74% bigger patch that I think conforms.

Why this remark? what value does this add? It comes across as rude and arrogant

Unhandled and not a goal of this particular patch. mysql -E does not specially wrap cell data either, which I think that is acceptable:

can you describe to me what you're trying to accomplish with this option then? this is not clear to me what additional benefit it will provide.

jengelh commented 1 year ago

Your style is outside the generally-established practices that makes it IMO unnecessarily harder for contributors.

can you describe to me what you're trying to accomplish with this option then?

./esedbtools/esedbexport (-E) MDB01.edb >/dev/zero; less MDB01.edb.export/Folder_160.973

Without -E:

1

With -E:

2

It is just much clearer to see which folder has which size.

joachimmetz commented 1 year ago

Your style is outside the generally-established practices that makes it IMO unnecessarily harder for contributors.

that is no justification of being rude about it

also "generally-established practices: based on what data / baseline / whos perspective ?

it is also good maintainer practice to keep a codebase in the same style.

It is just much clearer to see which folder has which size.

this will be highly dependent on the schema, and size of the table.

esedbexport was mostly created as a format analysis tool, to be ingested in other tools. What is the use case of using it to analyze the textual dumps of the tables manually? instead of using the library to reconstruct application specific structures?

jengelh commented 1 year ago

also "generally-established practices: based on what data / baseline / whos perspective ?

When one takes a sample of 1000 random (C- or C++-coded) packages that a contemporary Linux distribution contains, I would expect to find that which I termed generally-established practice. The choice of a such a system software distribution would ensure that there are varied authors/projects involved, and that the scope is serious/viable code.

this will be highly dependent on the schema, and size of the table.

Certainly, and that's why it is an option, selectable through a command-line switch. (I feel almost bad for ldapsearch not offering a horizontal mode of displaying things.)

What is the use case of using it to analyze the textual dumps of the tables manually?

For lack of a better word, for analyzing the inner formats. After export in vertical form, I can conveniently do this:

$ grep ExtensionBlob MDB01.edb.export/ExtendedPropertyNameMapping_160.972 | sort -u | head -n9
ExtensionBlob: 50726f50000401005001005111381ba64fd958cad9011819561913000000
ExtensionBlob: 50726f5000040100500100511138b9cc4fd958cad9011819561913000000
ExtensionBlob: 50726f500004010050010051123868d8da9e58cad901181afe001913000000
ExtensionBlob: 50726f50000401005001005112387100db9e58cad901181afe001913000000
ExtensionBlob: 50726f500004010050010051123896b1da9e58cad901181afe001913000000
ExtensionBlob: 50726f500004010050010051173803cafd9e58cad901181956191348484d054c6f676f6e
ExtensionBlob: 50726f50000401005001005117381fd5e59e58cad901181956191348484d054c6f676f6e
ExtensionBlob: 50726f500004010050010051173821fce59e58cad901181956191348484d054c6f676f6e
ExtensionBlob: 50726f5000040100500100511738274ae69e58cad901181956191348484d054c6f676f6e

and this pattern gives a few hints as to what the ExtensionBlob could represent.

joachimmetz commented 1 year ago

When one takes a sample of 1000 random (C- or C++-coded) packages that a contemporary Linux distribution contains, I would expect to find that which I termed generally-established practice. The choice of a such a system software distribution would ensure that there are varied authors/projects involved, and that the scope is serious/viable code.

So if I take another 1000 samples from another data set I get a different result. Why are your samples representative for the only way to do things? This reads very narrow minded to me.

And you still have not answered why you feel you have to be rude about it?

joachimmetz commented 1 year ago

For lack of a better word, for analyzing the inner formats. After export in vertical form, I can conveniently do this:

You can accomplish the same output with the tab separated format and awk (and equiv). So IMHO is not a strong use case to add another output format.

Also -E will be obscure for users not familiar with mysql, having -o FORMAT or equiv will be (1) less obscure/arcane (2) more forward looking.

jengelh commented 1 year ago

So if I take another 1000 samples from another data set I get a different result.

For a sufficiently specific data set, sure.

Why are your samples representative for the only way to do things?

I have not said that it is the only way; it is, however, I believe, a way to start things off. I think we can agree that libyal has sufficient level of "seriousness" (primary author isn't sole main user // the repository has been "shared" by 2nd persons to 3rd persons // unlike a “share” by Twitter, the inclusion into a something like a Linux distro makes for a "strong" share, since it involves integration work on behalf of the sharer.) That also sets the framework to limit the samples to "serious" code. If one were to sample Github instead, there is the risk that a lot of unused, unorganized, "5-minute" code that has no practical userbase and no reach to speak of, perturbs the statistics in disfavor of all the serious code, but especially in disfavor of libyal.

why you have to be rude about it

rudeness is an on own perception of norms(encycl.), to which I can but say: the norms you applied are, to use your words from above, not "the only way" to intepret things.

joachimmetz commented 1 year ago

rudeness is an on own perception of norms(encycl.), to which I can but say: the norms you applied are, to use your words from above, not "the only way" to intepret things.

Forms of rudeness include acting inconsiderate, insensitive, deliberately offensive, impolite, obscenity, profanity and violating taboos such as deviancy.

acting inconsiderate, insensitive, deliberately offensive, impolite is universally considered rude behavior.

It's an unusual style fair, this is an observation, but what follows is a snarky comment which is deliberately and not necessary (hence inconsiderate, insensitive, offensive, or impolite depending on your cultural bias). The fact that you try to spin it is even more concerning, even more so since you consider this to be the (your) norm.

I have not said that it is the only way; it is, however, I believe, a way to start things off.

I think this is pretty obvious that you did Your style is outside the generally-established practices that makes it IMO unnecessarily harder for contributors.

I emphasize you use of "generally-established" here; "general" by which standards? Given there are few actual standards, based on objective measures, mostly conventions that people adopt.

I'll close this issue now, given I see little value in the proposed changes unless there is stronger use case for it.