istlab / Alitheia-Core

A platform for software engineering research
http://www.sqo-oss.org
24 stars 46 forks source link

Issue Adding Project #7

Closed eligu closed 10 years ago

eligu commented 10 years ago

The problem I will describe is actually more of a "wrong" implementation than an issue with the code.

In order to add a project and to experiment with the code I decided to add the Commons IO from Apache Software Foundation. So I specified the following parameters (at Add Project view):

Project Name: commons-io Home Page: http://commons.apache.org/proper/commons-io/ Source code: svn-http://svn.apache.org/repos/asf/commons/proper/io

(All the remaining fields were left blank)

After executing I got the following error (from Logs tab):

ERROR - Error executing action addpr, id 1 Cause:Accessor failed: null ERROR - Accessor failed: null: eu.sqooss.service.admin.AdminActionBase.error(AdminActionBase.java:107) eu.sqooss.service.admin.actions.AddProject.execute(AddProject.java:176) eu.sqooss.impl.service.admin.AdminServiceImpl.execute(AdminServiceImpl.java:111) eu.sqooss.impl.service.webadmin.ProjectsView.addProject(ProjectsView.java:172) eu.sqooss.impl.service.webadmin.ProjectsView.render(ProjectsView.java:145) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) java.lang.reflect.Method.invoke(Unknown Source) org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.doInvoke(UberspectImpl.java:389) org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.invoke(UberspectImpl.java:378) org.apache.velocity.runtime.parser.node.ASTMethod.execute(ASTMethod.java:270) org.apache.velocity.runtime.parser.node.ASTReference.execute(ASTReference.java:252) org.apache.velocity.runtime.parser.node.ASTReference.render(ASTReference.java:339) org.apache.velocity.runtime.parser.node.SimpleNode.render(SimpleNode.java:336) org.apache.velocity.Template.merge(Template.java:328) org.apache.velocity.Template.merge(Template.java:235) eu.sqooss.impl.service.webadmin.AdminServlet.sendPage(AdminServlet.java:295) eu.sqooss.impl.service.webadmin.AdminServlet.doGet(AdminServlet.java:196) eu.sqooss.impl.service.webadmin.AdminServlet.doPost(AdminServlet.java:233) javax.servlet.http.HttpServlet.service(HttpServlet.java:727) javax.servlet.http.HttpServlet.service(HttpServlet.java:820) org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) org.mortbay.jetty.servlet.OsgiServletHolder.handle(OsgiServletHolder.java:101) org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) org.mortbay.jetty.servlet.OsgiServletHandler.handle(OsgiServletHandler.java:117) org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) org.mortbay.jetty.Server.handle(Server.java:324) org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879) org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:741) org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:213) org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)

I tried to debug the error by remote debugging the method: eu.sqooss.service.admin.actions.AddProject.execute

The problem was at line 151 of this method, here is the stack trace: java.util.NoSuchElementException at java.util.ArrayList$Itr.next(Unknown Source) at eu.sqooss.plugins.tds.svn.SVNAccessorImpl.resolveRevision(SVNAccessorImpl.java:320) at eu.sqooss.plugins.tds.svn.SVNAccessorImpl.getHeadRevision(SVNAccessorImpl.java:356) at eu.sqooss.service.admin.actions.AddProject.execute(AddProject.java:151) at eu.sqooss.impl.service.admin.AdminServiceImpl.execute(AdminServiceImpl.java:111) at eu.sqooss.impl.service.webadmin.ProjectsView.addProject(ProjectsView.java:172) at eu.sqooss.impl.service.webadmin.ProjectsView.render(ProjectsView.java:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.doInvoke(UberspectImpl.java:389) at org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.invoke(UberspectImpl.java:378) at org.apache.velocity.runtime.parser.node.ASTMethod.execute(ASTMethod.java:270) at org.apache.velocity.runtime.parser.node.ASTReference.execute(ASTReference.java:252) at org.apache.velocity.runtime.parser.node.ASTReference.render(ASTReference.java:339) at org.apache.velocity.runtime.parser.node.SimpleNode.render(SimpleNode.java:336) at org.apache.velocity.Template.merge(Template.java:328) at org.apache.velocity.Template.merge(Template.java:235) at eu.sqooss.impl.service.webadmin.AdminServlet.sendPage(AdminServlet.java:295) at eu.sqooss.impl.service.webadmin.AdminServlet.doGet(AdminServlet.java:196) at eu.sqooss.impl.service.webadmin.AdminServlet.doPost(AdminServlet.java:233) at javax.servlet.http.HttpServlet.service(HttpServlet.java:727) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.OsgiServletHolder.handle(OsgiServletHolder.java:101) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) at org.mortbay.jetty.servlet.OsgiServletHandler.handle(OsgiServletHandler.java:117) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:741) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:213) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)

Looking at the stack trace we will see that the problem comes from: eu.sqooss.plugins.tds.svn.SVNAccessorImpl.resolveRevision line 320.

If we see at the implementation we will realize that the call at: log = getSVNLog("", svnrev.getSVNRevision(), -1) (line 319) will return an empty list. So the first bug here is that we have an uncatched exception, because we call the iterator().next(), method on an empty list!!!

Having some experience with svn and SVNKit and looking at the implementation of method: List getSVNLog(String repoPath, long revstart, long revend) (SVNAccessorImpl) it seems that you are trying to call: org.tmatesoft.svn.core.io.SVNRepository.log() with the startRev same as the latest revision of the repository and the startRev -1. In this case, the implementation of SVNKit's SVNRepository.log() will put endRev = startRev, so you are expected to get the latest SVNLogEntry. However the call to that method return no log entry in our case, why? The problems lies at the fact that ASF has a central SVN repository with a lot of projects there, so the latest revision in our case doesn't correspond to a change made to the path /commons/proper/io (in repo) that is why it returns no log entries. However when we call SVNRepository.getLatestRevision() it will return the latest revision of the repository, and in our case because someone has made a change to another project within ASF's repository all the repo's tree revision number will equal to the latest commit number. In other words there is a problem with the implementation of SVNAccessorImpl: If we have a repository with lot of projects we cant add one of those projects to Alitheia. I believe that this is a serious limitation, having in mind that ASF is a very valuable repository of OSS projects, especially of Java based projects (look at Apache Commons).

I think there is a workaround here. Here is the description: Let say we have a repo with multiply projects and we want to get the last commit of a certain project (suppose ASF's repo). We create the repository object for that project (commons-io):

    DAVRepositoryFactory.setup();
    String url = "http://svn.apache.org/repos/asf/commons/proper/io";
    SVNURL svnurl = SVNURL.parseURIDecoded(url);
    SVNRepository repository = SVNRepositoryFactory.create(svnurl);
    ISVNAuthenticationManager authenticationManager =     
                                    SVNWCUtil.createDefaultAuthenticationManager();
    repository.setAuthenticationManager(authenticationManager);

In order to get the last log of the repo we have:

    long lastRev = repository.getLatestRevision();

The lastRev, however, might be a revision that doesn't affect the project, so in your implementation it can not be resolved (see SVNAccessorImpl.resolveRevision()). In order to get (or let say to resolve) the last revision that affected any of the files/folder under /commons/proper/io we can take all the logs:

Collection<SVNLogEntry> logs = repository.log(new String[]{""}, null, 0, lastRev, true, true);

and the last revision is the last item at logs collection. Or a better implementation would be:

repository.log(new String[]{""}, lastRev, 0, true, true, 1, new ISVNLogEntryHandler() {
        @Override
        public void handleLogEntry(SVNLogEntry svnLogEntry) throws SVNException {
            System.out.println("Last rev (handler):" + svnLogEntry.getRevision());
        }
    });

The last implementation, will avoid to download all log entries, and it will start in descending order, so the last revision will be the first entry that is passed at handler (look at parameters, we have set lastRev, 0). With this code you can easily resolve last revisions in repositories such as this, we are working with. I think the only method that must be changed would be the:

private long getHeadSVNRevision() throws InvalidRepositoryException {
    long endRevision = -1;
    if (svnRepository == null) {
        connectToRepository();
    }
    try {
        long lastRev = svnRepository.getLatestRevision();
        final Rev revision = new Rev(-1);
        svnRepository.log(new String[]{""}, lastRev, 0, true, true, 1, new ISVNLogEntryHandler() {
            @Override
            public void handleLogEntry(SVNLogEntry svnLogEntry) throws SVNException {
                revision.rev = svnLogEntry.getRevision();
            }
        });
        endRevision = revision.rev;
    } catch (SVNException e) {
        logger.warn("Could not get latest revision of " + url
                + e.getMessage());
        throw new InvalidRepositoryException(url, e.getMessage());
    }

    return endRevision;
}

private static class Rev { long rev; Rev(long r) { rev = r; } }

In my case it works, however, I cant say for sure if it affects other components of alitheia because, I haven't managed to Add Project (it can't find a BTS accessor and Mail accessor so it throws exceptions) so I cant Update All and try some metrics (or in the best scenario I haven't any test cases to run against). _It is very strange the procedure of how should someone can Add Project _

EDIT: The new implementation of getHeadSVNRevision() will work the same as the previous one if there is a "one project per repo", it will return the HEAD of the whole repo, however if there is a "multiply projects per repo" it will return the last log that affected any of the files under the path that scm.source URL is pointing.

dspinellis commented 10 years ago

Thank you very much for the debugging and the proposal. Here is how I recommend to handle it:

eligu commented 10 years ago

It is very important to us to be able to Add Project from Project Management tab. I tried Gnome-VFS and JConvert, I installed them, but I cant trigger any metrics plugin, the reason is that when I run Update All it seems that there is no updating in the existing projects' assets (I checked the files manually on disk and there is not any new file created), I set some breakpoints to metrics plugins but no one of them was captured.

In order to test anything we (@teohaik, @eligu) will need someone to show us how can we load a working project that will trigger the metrics. We expect at least from (@mkechagia, @gousiosg, @bkarak) to send us the instructions for adding a project so we can have data to work against. @dspinellis

gousiosg commented 10 years ago

As a side comment, Alitheia Core currently expects one (usually local, can be remote) repo per project. You can use svnsync on a specific path in the Apache SVN repo to extract the history of just this path. This is how we processed several sub-projects from the Apache and KDE projects.

We chose this option, as opposed to handling meta-repositories, as it is requires less configuration (no need to manually map paths to projects), is more general (works on both one and multiple project repos), makes processing faster (it is not trivial to process 1M commits and have a deep directory structure) and subsequent querying for stats per project is easier (again, no need to limit queries on specific paths).

eligu commented 10 years ago

@gousiosg Yes I realized that looking at the code. However here are some implications (drawbacks):

  1. In case we have a repository per project, and if a resource except for a source file (suppose readme.txt) is commited, then every metric that is associated with a Project File activator will trigger, and will be calculated again that is, an unnecessary calculation, because a metric that works only on source file doesn't need to be activated when a .txt file is commited (correct me if I am wrong). We can avoid this by testing the changed paths (MODIFICATION, ADDITION, DELETION) and trigger the metric only when there is a need to be triggered.
  2. Alitheia is supposed to be a project monitoring tool, how is that possible if you call svnsync for each project like this on the cmd? That means, you have to manually download project assets with svnsync, and then call update on Alitheia (or whatever you do in your implementation). You don't need to process 1.5M log entries from Apache, because in my case working with commons-io project I managed to get only the entries that are under /commons/proper/io (there are 1K) and so it is very fast the processing of those entries, and you will need to do it only for the first time you add a project. Next time you call update all you have to do is get the logs that aren't already in project assets, so if you have the last local log 1500 and the repo is at 1600 all you have to do is starting from 1600 to 1500 in descending order and you will get the logs (if any that impact the paths under /commons/proper/io) so it will be much faster. However looking at the code I see that you are leveraging SVNKit to handle the local repository and you keep a svn structure locally, which might simplify things from a developers perspective but limits Alitheia capabilities.

Replies to your response:

" is more general (works on both one and multiple project repos)" - actually this will not work in multiply project repos, because you can not resolve a revision for a specific project, if the head revision has nothing to do with the project you are analyzing, unless you are doing manual processing of repos (svnsync), otherwise it will throw an exception (at least the one that I am describing in the first comment).

"makes processing faster" - actually that is what am I trying to explain: when you call SVNRepository.log() and the URL is /commons/proper/io all you will get is logs that affect only the paths under io which are logs of the specified project, so you don't need to get all the logs of the whole repo. Have you ever tried to svnsync an ASF commons project, I did it and it took more than 6 hours and it didn't finish the job either, so I suspended it.

gousiosg commented 10 years ago

@eligu short replies

  1. Metrics are only calculated on files that have changed (i.e., a entry was added to the ProjectFile table), so unnecessary recalculations cannot happen. I do not see why metrics should only be calculated on src files. You can certainly have metrics on binary or documentation files.
  2. I was trying to explain why we made this design choice. Of course, if you make a different design choice (as you are implying we should do), you can optimize things in different ways. The problem is that converting Alitheia Core to handle multi-project repos will take much time which I don't know whether it will pay off (research-wise). Moreover, Alitheia Core can be triggered to recalculate metrics via HTTP. So you can write a simple shell script that calls svnsync and triggers metric recalculations every time there are new revisions.
eligu commented 10 years ago

@gousiosg

  1. Can you help me understand this: if I have a metric that calculate complexity of a source file (let say .java) and a file .txt will be added/removed/modified in the next commit, will be this metric calculated or not? If yes, then this is what I am calling unnecessary calculation. Probably it will just be activated and let the metric code decide which changes were made to the project and if it affects its previous calculation, however I don't know that info.
  2. Yes I know there is a lot of code that must be changed, however I thought ASF is very valuable source of OSS projects to play with, so if alitheia has the capability of extracting the project's assets from a multiply projects repo like that, without manual intervention that would be great.
  3. How can I trigger metrics calculation in alitheia? I have imported Gnome-VFS and JConvert, I clicked Update All but as I am inspecting the data an disk nothing is created, so it seems that there is no update. When I click Synchronize on each metric, nothing is triggered (I checked it, by debugging, and println statements). Suppose I have to create a metric plugin and test it against a well known project, I can't debug my code if no changes are made to the remote repository, because there will be no update and no metric trigger (correct me if I am wrong)! It would be a time saver if you show me how to trigger metrics calculation, via HTTP or whatever, independently of the fact that there is an update or not.
gousiosg commented 10 years ago

@eligu

  1. It is up to the metric plug-in to decide whether it can work on both .txt and .java files. The metric will be triggered, but this does not necessarily mean too much more effort if the decision is early on in the run method.
  2. But you can already process ASF projects without changing anything, using the svnsync method described above. In any case, it is much faster to mirror repos locally and analyze them from the mirror than to do the analysis using the online repos, especially when developing new metrics/updaters. If you are bold enough, a pull request would be welcome.
  3. You can delete the contents of the *Measurement tables (using SQL) and click Synchronize again. Note that you need to install the metrics first.
teohaik commented 10 years ago

As far as the bullet 3 is concerned, the "synchronize" button currently does nothing. No measurement table in the DB is updated with any value, so there are no contents to delete. This is for projects JConvert and Gnome-VFS. For some mysterious reason, no metric seems to be calculated. Thousands of jobs in the queue but no results!

eligu commented 10 years ago

@gousiosg

  1. Thank you for that info.
  2. Actually I didn't mention to do online analysis by calling each time log/diff/contents etc. I had an implementation for a project of mine which uses a custom directory layout (not svn) and had to deal only with the logs having in changed paths the one that points to URL, however, I can't make a pull request because I don't know much about the internals of the project.
  3. I will try to see Metrics Tables in DB, but I am afraid that @teohaik is right here, I haven't noticed any table that stores metrics (I saw some tables but they were all empty)
gousiosg commented 10 years ago

@teohaik @eligu

I just cloned AC, built/installed it, installed 2 plug-ins (Project size metrics, Project test case metrics). Then, I imported the JConvert project from sqo-oss.org (using the "project.properties file location" input box), selected the project and clicked on "Run all updaters". Worked almost perfectly (212 jobs run, 1 failed). Then I clicked on the "Synchronized" button for each metric. Worked perfectly (1091 jobs run, 0 failures) This is with the stock configuration (H2 database and 4 threads).

I am not sure why you are having trouble with it. Please check the files runner/*.log and report bugs accordingly.

eligu commented 10 years ago

@gousiosg @teohaik

I am trying Alitheia with the default settings (H2 DB) and it seems it is working. I have loaded Gnome-VFS and clicked update all, it finished 4900 jobs and has some 80 failed (haven't finished the update yet). If I managed it to work with the H2 then I think the problem is the MySQL and to make it clear I think you have to update the mysql connector dependency to use the one that comes with the last version of MySql because probably it doesn't play well with the new server, and with hibernate. I think you must do some checks to see with which MySql version does the current version of Alitheia works, and to update the main page (just in case another user wants to try it with mysql). It is getting more than two hours on update and it is still updating (probably metrics are triggered by default, eventhough I haven't installed them, they are just registered).

gousiosg commented 10 years ago

Before jumping to conclusions regarding the MySQL driver, could you please let me know what is wrong with it? Do you get any exceptions? On what version of MySQL? I am pretty sure that AC works on both MySQL 5.1 and 5.5 with the current driver.

In any case, this issue is now resolved, therefore I am closing it. If you find anything noteworthy about MySQL please open a new one.

eligu commented 10 years ago

@gousiosg Yes you are right, actually I don't get any exception with. Using MySQL I cant update the project it doesn't show any Last Revision or Email. On the other hand when using H2 I am seeing it updating (however I am still waiting for update to finish, in the Jobs Tab I am getting this: ModuleResolver:Gnome-VFS:0.0% for about 2 hours now). image

It is very strange why with MySQL it will update nothing (having created user alitheia, with alitheia pass, and changed the Concurrency to READ_COMMITED), and with the default settings using H2 it will update (at least it seems to do so).