marbiaz / GitWorks

work-in-progress java application to analyze git repos
3 stars 4 forks source link

What is the format of the generated graph? #1

Open IkhlasAlhussien opened 9 years ago

IkhlasAlhussien commented 9 years ago

Hello,

What is the format of the generated graph? is it GML format?

Thanks

marbiaz commented 9 years ago

Hello, currently a metagraph can be -- serialized to/deserialized from a binary file, using the GitWorks.importData and >GitWorks.exportData methods -- exported in a .gexf file ( see http://gexf.net/format/ ), via the Dag.exportToGexf method.

Writing export methods is quite straightforward and it should be done in the class Dag.

Thanks for your interest!

M.

amalfiyd commented 9 years ago

Dear marbiaz, thanks before for providing this tool on github :) btw, can you help with with real example of arguments to run the code? I am having difficulties in defining the argument list, an example of these argument will be very helpful :) repo list file path repo dir path jgit gits out dir jgit trees out dir we want to analyze bitcoin repo on github

marbiaz commented 9 years ago

Dear amalfiyd, thanks for your interest! The gitworks toolset is still being developed, but it can already perform different kinds of analysis. The main difficulty for users is the fact that I didn't find the time to wite a decent API yet. So, depending on the kind of analysis you want to perform, some mods in the main class may be needed.

In any case, gitworks is not meant to analyze repositories online, but local git clones. So, first you must clone the repos you want to analyze, then you tell gitworks where they are (repo dir path). First of all: do you want to analyze only the main git repo or perform a differential analysis of the thousands of forks?

amalfiyd commented 9 years ago

Dear marbiaz,

basically we want to analyze a repository for a single project from github then want to analyze the gitworks generated patterns that are found and maybe modifying the code based on our needs. We want to understand the code by trying it first, but we have no idea what the parameter should be to run the program. if you can give us an example of each parameter(files or string) that would help us a lot :)

i don't know if we understand your question correctly, but we want to analyze the commits topology of the whole project

marbiaz commented 9 years ago

Dear amalfiyd, here's a brief description of the inputs. It should be ok for a single fork. They look probably awkward to you, but they have been thought for a very specific set of experiments and I still have to find time to make a more generic IO interface... :

'repo list file path' : /absolute/path/to/some/textfile This textfile should contain 1 line having exactly this format: 'project name''4 spaces''owner name''4 spaces''start timestamp''4 spaces'0'4 spaces''timestamp of the local clone' Fields here are delimited by single quotes. So, for the project a4a made by user drfinlayscott (which you can find at github.com/drfinlayscott/a4a), that was started on 10 April 2012 and cloned on your machine on 19 Jan 2013, you should have this line in your textfile : a4a drfinlayscott 2012-04-10T10:08:28Z 0 2013-01-19T22:09:39+0100 (grrrr... I hate html formatting... anyway: in between each field there are exactly 4 spaces... )

'repo dir path' : /path/to/the/local/clone/ for instance, if you cloned a bare copy of a4a in /home/you/a4a.git , you should write /home/you

'jgit gits out dir' : /path/to/a/working/dir/for/gitworks This dir will be used throughout the execution, so it is important that is different from 'repo dir path' and 'jgit trees out dir' and that is kept clean between executions.

'jgit trees out dir' : weird name, but it is just the /path/to/the/output/dir , where the output will be written

To get started, you will probably need to comment out the calls to Results.metagraphStats() and some external scripts... I hope this will help a bit and I encourage you, if you would like to use gitworks and thus modify it, to create your own fork of my repo, so that we can "keep in touch with updates" :)

Let me know!

amalfiyd commented 9 years ago

Dear marbiaz,

thank you for you reply before, we have successfully export the graph in .gexf file :)

Now we want to extract patterns on the metagraph, does it support this feature? if not, we will find a way to manipulate the gexf file :)

marbiaz commented 9 years ago

Dear amalfiyd, I used the gexf only to visualize the metagraph. The pattern analysis is done on the very Metagraph java object. The only problem is that the author of the algorithm which finds the patterns does not allow me to distribute its code. So, I used it, but I cannot put it here... I'll see if I can add it to this repo as a jar library and modify my code a bit accordingly. By the way, please consider that the gexf file does not include all the edges of the metagraph. This happens because the metagraph is actually a multigraph (it can have more than 1 edge between the same couple of nodes) and gexf format does not support multigraphs.

IkhlasAlhussien commented 9 years ago

Hi marbiaz,

Have you tried other tools like (FANMOD, mfinder, MAVisto) before choosing to implement Grochow algorithm by yourself? if yes, which tool is better or you advise to use? If you are to provide a jar library for Grochow algorithm, would it take long time?

thanks

marbiaz commented 9 years ago

Hi, I read some stats about them, but I didn't use them. Grochow algorithm is indeed fast. I'm travelling these days, so I cannot work at it immediately. I will be at it during this weekend.

marbiaz commented 9 years ago

Hi all, please pull the recent updates to have a version which actually can compute patterns of metagraph. I cannot use Grochow's jar by direct call in gitWorks, because it uses incompatible library versions, so I added an ugly system call to a java -jar execution (the executable jar -- Linux 64bit -- is now in the lib/ directory). Let me know if any issue arises.

amalfiyd commented 9 years ago

Dear Marbiaz,

thx for the update :) , I am trying to run it now, though there are some code lines i want to ask, e.g. : Runtime.getRuntime().exec(pwd + "/loadDumps.sh " + getSafeName(fe)).waitFor(); how about these shell script file, should we create it first?

on the previous version, i just commented these item since i run it on windows, and i believe i have to run it in linux to run the shell script

marbiaz commented 9 years ago

yes, you can comment out all the calls to .sh scripts. Also, you should create an empty directory named 'polygs' in your working directory. It will be used by the pattern finding algorithms to export some files. The first time, it will take a while to generate all patterns. Then, if you do not delete the .polyg files that are created in the polygs directory, next executions will be much faster.

amalfiyd commented 9 years ago

Dear Marbiaz,

Will do, I will try it right away Thank you for being so helpful for us :)