Open MandarinConLaBarba opened 10 years ago
First stab..
{
id : ?, //int or mongo ID obj or some other UUID
uri : <string>, //can be git repo, gist, or file
git_hash : ''. // or can we get all these git_ ones from uri?
git_repo : ''.
git_branch: ''.
edn : <string>, //the code transformed to edn
author : <string>, //github username
status : <string>, //pending || complete || etc - what is the flow?
module : <string>, //the visualization module to use ("trees", "cubes", "van-gogh", etc)
date_created : <number> //unix timestamp
}
Not sure we want to keep the complete original, at least not more than temporarily.
How about keeping github/git metadata so we can always recover the original? So I guess, repo, branch and SHA1 for the commit. Makes sense to have local working copies as we build stuff, but I don't know where or how. Ideally we'd make a shallow clone (or whatever they're called) into a local filesystem - gotta look at how that would work on Heroku.
Yeah so the reason I thought it might be good to keep the raw code (excluding dependencies and irrelevant files) would be to allow for correlation of specific portions of the code to specific elements of the rendering. However since this would be complicated and not part of initial functionality I agree. Also, it would be trivial if we decide to keep the consolidated raw code later..
I'm still struggling a bit with the process of aggregating and a repo into a single raw code file...the rules and process, etc. Opened #11 to discuss this.
With Mongo you can just nest files in the same tree as a directory structure if you want. Something like this:
{
repo: {
root: {
name: 'my-repo',
files: [
{
name: 'readme.md',
type: 'markdown',
content: 'some stuff\n\nsome other stuff'
}
],
subdirs: [
{
name: 'src',
subdirs: [
{
name: 'clojart',
files: [
{
name: 'core.clj',
type: 'clojure',
content: 'a bunch of content text'
}
]
}
]
}
]
}
}
}
That would load easily into a clojure data structure, which could be the input for whatever transformations or processing you want to do.
Oh yeah, but I thought we said we weren't gonna keep the original source in the db?
Oh well we can parse it straight into data structures with the same "schema". I guess I wasn't clear that we'd come to a conclusion.
Thinking about it some code may be too big to do a single pass on so storing it might be more efficient and even necessary in memory constrained environments.
Ah interesting, so until now I hadn't really thought of the directory tree as part of the transformed data, but that does make sense. See #11 for that discussion.
Should include enough for authentication/signup, and storing submissions.