CSV output - Githubissues

wvmarle commented 11 years ago

PrettyTable is not so pretty on my 80-character terminal - rather horrible even - and output can't be parsed well by potential downstream software. So suggesting to add a csv output option, and maybe another print format output that's not a pretty table but that's readable on a smaller terminal.

Specific: Add command line option --csv [file name] This results in output to be given in csv format. If file name is omitted, use stdout.

Related to this I suggest adding --json [file name] as option too, for json type output.

I had a quick look at the current source, and at first glance it seems the easiest way to incorporate these outputs and allow for future other outputs (html/xml/whatever) would be to create a separate class for that, where the current PrettyTable can be an option too (the default option, if nothing else is indicated).

Comments? Suggestions? I'll be happy to pick this up myself.

(edit: it's not CVS but CSV of course)

gburca commented 11 years ago

You mean CSV (comma separated values), not CVS, right?

I was going to fix the problem with wide tables by wrapping the contents of some of the cells. This code seems to allow you to specify a wrap function, but I haven't had time to investigate much: http://code.activestate.com/recipes/267662/

If we wrap some of the insanely long SHA256 and similar IDs, I think the output for most commands becomes much more manageable.

I don't mind having multiple output formats, but for command line usage, CSV and JSON are even less readable than PrettyTable. They are however certainly better for feeding to a downstream parser.

wvmarle commented 11 years ago

Yes CSV. Corrected issue. And of course the output is for machine reading: CSV for import into a spreadsheet (makes it also much easier to go through your inventory, check on sizes, etc), JSON for communication with another program.

So what about yanking all this printing code from the current glacier.py and putting it in a new class, e.g. glacieroutput.py that takes the glacier data and options and then outputs to the required format(s)? It's quite an operation so not going to start it without maintainer's approval.

SHA256 is long, the archive ID is worse. Wrapping has a major issue though: you can't easily copy/paste an archive ID, I did just that yesterday for deleting files from glacier. I was more thinking of an option for a list output, with one line for what is now a column. Ugly for sure but may work better for narrower terminals.

offlinehacker commented 11 years ago

We can make a new class for output, but it has to be well designed(think about very basic exceptions, logging and documentation), and i'm pretty shure somebody has done something like this before, so looking for existing pyhon libraries that provides output in different formats should give you some result(at leas i hope).

Before that, we have to migrate all this code to botonly branch into GlacierWrapper class, which has proper exception handling and logging built in. This is waaay more important, because pluging more and more code to uncorectlly handled and unlogged parts of code is way to disaster. We will be very happy if you would make class that supports different kinds of output, but we would be more happy if somebody would start porting code to GlacierWrapper. I have already ported some code and you can see there how exceptions are handled. When code is ported we will be able to write tests with mocking of glacier functionality.

wvmarle commented 11 years ago

Can you explain a little more on the intention of GlacierWrapper and which code should be moved where, etc?

Just had a glance at it; ran into a 404 exception myself when accidentally specifying a non-existing vault name for uploads, should be handled nicer than with a crash as it happens now.

offlinehacker commented 11 years ago

Well i'm making GlacierWrapper, which will be wrapper for all our current could that does not deal with input and output and core glacier call. And it's not tested yet, but when it will be done, because of exception handling it will be much easier to test it and we will know what's going wrong. Please if you have intentions to help me on this class let me know so i can update my repo and you can work on up-to-date code.

wvmarle commented 11 years ago

I can always give it a try!

As you probably realised I'm quite new in this collaborative programming, I've done quite some work for my own in Python and this is simply a project that's very useful for me personally so happy to try to contribute.

In SimpleDB code: "create domain if it doesn't exist yet". Bad idea; usually if it doesn't exist it's a typo, this may cause chaos. Only create if user explicitly demands it.

And it's 'retrieve', not 'retrive'.

offlinehacker commented 11 years ago

Well hope we will be able to learn you some nice coding, i'm also still python learner, but have been programming in so many languages and python is the best. And for github, thanks god it exists ;)

Here is newest version of code https://github.com/offlinehacker/amazon-glacier-cmd-interface/tree/glacier_lib . To get it do something like this:

git remote add origin2 https://github.com/offlinehacker/amazon-glacier-cmd-interface.git
git branch somebranch
git checkout somebranch
git pull origin2 glacier_lib

And thanks for typos and errors and will remove that part where we create if it does not exist yet.

offlinehacker commented 11 years ago

I think having #irc channel would be awsome, so we could at least arrange what will somebody do, what do you think?

Github is mising chat feature. And talking here, just does not seem right.

wvmarle commented 11 years ago

Never used IRC but I know what it is and can be useful indeed. This is not a chatbox.

So as I understand the idea is to make this into a library, with glacier-cmd being a front-end for the whole thing?

I've some more ideas for features (automatic download of inventory; automatic resumption of multipart uploads) but indeed lets first get the basics right.

offlinehacker commented 11 years ago

Yes, because there are some radical architectural changes, i will make my editor update last commit(blob) every 5 minutes, so we will be on the same page.

wvmarle commented 11 years ago

OK, so I pulled in your glacier_lib repository on top of my existing stuff. Now I get conflicts in glacier.py (automerge fail - seems related to my changes that are not committed to the main branch maybe?) and lines like: <<<<<<< HEAD And I have no idea what I really have on my side now! This won't run; it seems to be a mess; no idea how to revert or anything... :-(

I definitely don't understand git yet. Great to have a separate fork to work on, but as soon as a pull is rejected by the main, or simply not done yet, you're diverting. And getting mess. This time I think I've to figure out how to revert the changes and create a second project dir for it, then at least I can keep a working version of glacier here.

offlinehacker commented 11 years ago

Look what it made is apply my changes on yours, but there were conflicts. To resolve conflicts, it shows you where confilcts are. To repair conflicts remove the code inside <<<<>>>> or inside >>>>>>> mycommit >>>>>, or whatever you want it to be after merge. These are just markers. So don't be afraid to remove them and make code how it should be! After you resolve conflict you must commit changes.

offlinehacker commented 11 years ago

If your and my code bases are different you will have problems if code fixed on same areas will be different betwene repos and you will always have to resolve conflicts on merge.

Git is only text merger. If you rename a variable and i don't we will always have conflicts it will not know how to refract code betwene repos for example(that would be awsome feature).

wvmarle commented 11 years ago

That's another problem indeed. Going to sleep now, well past midnight, solve it tomorrow. I'll just go use a different system.

Main disadvantage is that on my home computer I have only 640kb upstream, while my server has 20 Mb up. Much nicer for testing whether uploads work :-) But that one has to work properly. Glacier-cmd is supposed to push my backups to Amazon every night.

offlinehacker commented 11 years ago

@gburca my latest commit should save you a lot of memory in uploading and uploading.

Remember one thing. If you have big data in arrays, never slice it(like we did and boto stil does), or it will create a copy. In this case use memoryview.

wvmarle commented 11 years ago

Ah, now that's probably what happened in my case.

And no surprise other developers don't see it... I have "only" 1GB of RAM in that server, sending out pretty big files. Most people testing this software and running into this bug will simply say "oh, doesn't work, uninstall". I almost did, too. I just couldn't find an alternative to run to, and am happy to hack Python.

Why that slicing anyway? Why not just sending out a block the moment it's presented to GlacierWriter.write? Or at least when the data is at least as big as the block size, not only when it's larger than the block size? This way we always keep a second block in memory, and basically send out the previous block.

Suggest change glaciercorecalls.py:323

while self.buffer_size > self.part_size:
        self.send_part()

to:

while self.buffer_size => self.part_size:
        self.send_part()

Saves a complete block of data in memory, and as normally the part presented to this function is equal to the block size no dicing and splicing of the data is needed.

gburca commented 11 years ago

Good one!

offlinehacker commented 11 years ago

Since we are integrating with boto now, this is boto's task to handle this correctly. Still thanks to notice. Boto has same bug, please report there.

offlinehacker commented 11 years ago

Is this bug/feature still relevant, since we are now talking about different things. What about closing it? There are many different output formats that we could ouput. I will open new feature report.

uskudnik commented 11 years ago

That sounds best, yes.

uskudnik / amazon-glacier-cmd-interface

CSV output #45