harmy / boar

Automatically exported from code.google.com/p/boar
0 stars 0 forks source link

enhancement: structure of the sessions folder #67

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I realize my use of boar may be atypical, but some issues (especially wrt to 
efficiency/manageability) aren't obvious with small scale use, but come into 
much sharper focus at large scales.

I realize the boar repo is designed to only be accessed from the cli, and one 
isn't supposed to go poking their nose into their boar folders, but from both a 
visual processing standpoint and the actual efficiency of certain boar 
operations, I was thinking the sessions folder might benefit from being broken 
down into session based subfolders.

boar_repo
-/sessions
-/-/mysession1
-/-/-/1
-/-/-/2
-/-/-/N
-/-/mysession2
-/-/-/1
-/-/-/2
-/-/-/N

In a different issue I brought up the notion of assigning a guid to sessions 
(even something human readable with an incrementing counter like "session1, 
session2, ..., sessionN") and then having a file that maps the session guid to 
a "friendly name" chosen by the user that functions as an alias to that session.

Altering the sessions folder in this way has two main benefits imo: 

First, with heavy use of boar, having all sessions stored as globally 
incrementing revisions leads to an enormous (and unwieldy) number of revisions. 
Global revisions don't really have any significance, revisions relative to a 
session if what matters. By atomizing and isolating sessions to their own 
subfolders, the folder names become meaningful, and the revisions become 
meaningful in a human readable way. It is also the case that a "corrupt" 
session folder can be easily pulled out of the repo without any impact on the 
remaining session folders. Given that "simplicity" is a guiding design 
principle of boar, having the format be "intelligible" and accessible to a 
human without the need to rely on the output of cli tools is an important 
feature to make sure data is never "trapped" or unnecessarily obfuscated by the 
format.

The second benefit I see is in terms of performance and avoiding unnecessary 
file reads. At the moment, I believe all session.json files need to be read 
anytime a list command is issued. There is no reason for this. If a user only 
wants to see the revisions for "session1", by having sessions in their own 
subfolder, it's simply a matter of recursing over the session.json of *only* 
that folder, and the necessary information is quickly pulled up. For a boar 
repo managing a large number of session (mine definitely is), I suspect being 
able to target operations to specific sessions almost instantly without the 
need to enumerate or read unnecessary session.json files into memory will be 
noticeable.

Thoughts?

/cb

Original issue reported on code.google.com by cryptob...@gmail.com on 2 Mar 2012 at 2:11

GoogleCodeExporter commented 9 years ago
Expanding on the idea of atomizing revisions to session subfolders also 
introduces the ability to do things like clone specific sessions between boar 
repos without having to clone *all* sessions.

Say I have a boar repo on my home machine where I maintain both work and 
personal projects and then a boar repo at work where I only want to manage my 
work projects. Assuming I do something like prepend my session names with 
"work_" and "personal_", if the clone command was modified slighty to accept 
session names, it would be possible to push/sync all my "work_" sessions in a 
completely "clean" way.

Having revision folders incremented globally entangles completely unrelated 
sessions in a repo in an unintuitive unfriendly way. It makes sessions 
completely nonportable since a check in of any session will affect the base 
session of any future checkin in a nondeterministic manner whereas having 
sessions isolated to their own subfolders means that different boar repos can 
maintain an arbitrary number of sessions but still sync *certain* sessions 
between them (given that boar is intended as a personal vcs, not a dvcs, merge 
conflicts should be almost nonexistant, or very simple cases so long as a user 
takes care to keep repo clones in sync).

While I realize boar champions a write once design, "upgrading" the repo format 
could be accomplished without destroying the old sessions folder by renaming it 
to something like sessions_old, and given the fact that all the session data is 
hashed, and the only change being introduced is wrt to folder names and minor 
alterations to the session.json files (significantly no changes to the 
bloblist.json is required), then it should be fairly trivial to validate the 
integrity of the sessions folder post upgrade.

And more importantly, I do think this *simplifies* an existing unnecessary 
complexity, rather than introducing complexity.

Ok... that's my sales pitch, I'll lay off until you have a chance to respond ;)

Original comment by cryptob...@gmail.com on 2 Mar 2012 at 2:33

GoogleCodeExporter commented 9 years ago
Having a numeric id for every snapshot is very convenient in the code (and 
convenient code makes for fewer bugs). Unfortunately it might also be a bit 
confusing for the user for the reasons you mention. 

Having session names as file names would be intuitive, but would introduce 
issues with file system capabilities. For instance, "Session" and "SESSION" are 
the same thing on Windows, but different things on Linux filesystems.  Not to 
mention all the possibilities of unicode complications... 

The features you speak of, like session specific cloning, would be possible 
with the current system as well.

There are some minor changes I'd like to do if I could travel back in time. But 
in the end, the data format is already set, for better or worse.

Original comment by ekb...@gmail.com on 2 Mar 2012 at 5:04