orchard-labs / clojart.io

0 stars 0 forks source link

Describe process for concatenating/transforming a git repository #11

Open MandarinConLaBarba opened 10 years ago

MandarinConLaBarba commented 10 years ago

I guess to make it easy we should just support Clojure code out of the gate.

Steps:

  1. Clone the repo locally (or we could use API, but there are rate limits, and this would be harder)
  2. Exclude all disallowed files. For example, maybe .txt, or whatever. Also stuff under 'node_modules' or anything that looks like it is dependency code and not real source for the repository.
  3. Make sure at least one file in the repository is of a supported type (again, maybe just Clojure for now)
  4. Read all files in by walking the directory tree, concatenating as we go
  5. Transform the code into EDN, and store in the db in the .edn column. See #9

Questions

cc @j0ni

j0ni commented 10 years ago

So, continuing from #9, I'm not sure the directory structure is particularly important, but the require graph is. I think capturing the directory structure is really just a convenient way to be able to follow the require graph around and find the relevant sources.

MandarinConLaBarba commented 10 years ago

Hmm, OK. So why do you think the require graph is important? So that we can tell what is actually in the program and therefore accurately represent it in a visualization?

j0ni commented 10 years ago

I guess it depends on what you want the visualization to represent.

I figured you're trying to do something which represents the structure of the code, possibly also the semantics of the code. Either way, symbols that are referenced within a given namespace which come from outside that namespace need to be found via the require graph, otherwise they just dangle.

On the other hand, if you're just looking to build a data structure and aren't interested in the meanings or relationships between the symbols and their meaning (i.e., def(n)s) then I guess it doesn't matter.

MandarinConLaBarba commented 10 years ago

Well, I think the approach of using the require graph is more correct in the sense that the visualization will accurately reflect the code in the program. But I'm not sure that it will make much of a difference in the actual rendering - I think so long as the method is consistent it won't matter that much. However I do think it's a more interesting story to tell if we're using the actual require graph.

I wonder how hard it will be to develop an accurate graph w/o actually interpreting the code. I'm thinking of the issues w/ obsolete ways to import modules in Clojure (e.g. use vs require, and so on..), or even more difficult, in JavaScript with various module loaders (RequireJS, CommonJS, ECMA6, etc).

MandarinConLaBarba commented 10 years ago

Hmm, but actually if we're using the graph, one might be able to actually recognize elements of their program..for example, maybe we have a "trees" viz module that renders a tree per module. In a sample program, there's three modules, one with 100 symbols, one with 80 symbols, and another with 50. Perhaps the one w/ 100 symbols is a larger tree than the others, etc.