Support for aggregate task

Kwestor commented 9 years ago

Add support for report aggregation from multiple projects - it's already done in scalac-scoverage and sbt-scoverage and should be ported to gradle-scoverage.

I'm already working on it and when legal team gives me green light on PR I'll submit one.

maiflai commented 9 years ago

Hi - I raised https://github.com/scoverage/scalac-scoverage-plugin/issues/81 to aid the sharing of code between the actual build plugins (I think there is duplication at the moment). I thought this would include the aggregation between multiple modules.

Is your PR for the gradle plugin, or does it cover multiple projects?

gslowikowski commented 9 years ago

This feature doesn't work at all. I told @sksamuel about it, but with no response.

Kwestor commented 9 years ago

My PR includes one file which calls CoverageAggregator.aggregate from scalac-scoverage.

@gslowikowski I think I have tested aggregation for multi-module sbt application and it worked.

gslowikowski commented 9 years ago

I can aggregare scoverage data (from xml files), but when I creage ScoverageHtmlWriter I have to pass sourceDirectory parameter. It's used to calculate relative paths from absolute paths written in scoverage data. Lets say you have two modules foo and bar. So you have two paths to sources: foo/src/main/scala and bar/src/main/scala. What path will you pass to ScoverageHtmlWriter as sourceDirectory parameter value?

gslowikowski commented 9 years ago

... I mean when generating aggregated html report.

Kwestor commented 9 years ago

Yes, I have exactly this problem now with gradle, it merges two absolute paths and fails with java.io.FileNotFoundException.

gslowikowski commented 9 years ago

I see three related issues, I don't know if it's the best place to discuss it all, but anyway:

plugin should accept many source rootes for single project/module - this is common case
plugin should work when source files structure does not correspond to package names - very common case
project root in multimodule project should generate aggregated reports without knowing anything about module source roots.

I don't know, if package structure can be used instead of source files structure. If yes - problems solved (no need to calculate relative paths). In no - all these problems must be solved anyway, I don't know if it will be more complicated or not.

gslowikowski commented 9 years ago

I think that if 1. and 2. will be solved, solving 3. will be easy, but starting from 3. does not make sense IMO. I don't know, how @sksamuel tested aggregate reports when he implemented it. Perhaps aggregate check and xml reports work, but html report does not work for sure.

Kwestor commented 9 years ago

I think I managed to "trick" scoverage to work with aggregate passing parent directory (of whole project) as a source dir if you run an aggregate task. I have to check if everything still works for standard projects, but it seems it's ok.

gslowikowski commented 9 years ago

It works at first sight, but such trick does not make sense. @sksamuel can you comment this?

maiflai commented 9 years ago

I think that's the way it's intended to be used - have you checked that you can follow all the links through to the source code?

gslowikowski commented 9 years ago

The links look good. I don't get it. Paths to all source files should have prefixes foo/src/main/scala and bar/src/main/scala (I have foo and bar modules). If these prefixes do not break the logic, that's weird to me. I must debug it.

maiflai commented 9 years ago

I think it works because it replaces the common element of all source paths - the root project directory.

I think it might not work if you have multiple levels of modules to merge together, but I've not tested that yet.

gslowikowski commented 9 years ago

I have checked it. The files are in foo/src/main/scala/path/to/MyClass.scala.html and bar/src/main/scala/path/to/MyOtherClass.scala.html files and the links from overview and package reports are <a href=foo/src/main/scala/path/to/MyClass.scala.html"> and <a href=bar/src/main/scala/path/to/MyOtherClass.scala.html"> so they are work, but such locations are not proper IMO.

If they were proper, why we pass the path to sources root for relative paths calculations in single module case? We could pass project base directory instead. With this simple trick projects with multiple source roots would work. But, I must say it again, paths starting from src/main/scala or modulename/src/main/scala are not proper.

Main first, most important question, is do we need to use source files structure in HTML reports, or could we use class package names? I don't know if package names is acceptable at all, but if it is, it would be better. There would be no need to save source paths in scoverage XML files, neither absolute nor relative, no need to calculate relative paths from absolute ones. Multimodule projects (or single module with multiple source roots) could contain the same source file relative paths in different source roots. Aggregated reports would not need to know, where the source roots of every child module are. There would be no need to make such tricks as the above one.

What do you think about it?

maiflai commented 9 years ago

I think that it's reasonable to use the source file structure - the statements that we are instrumenting are written in that structure and displaying our results to the user in that same structure helps them to see exactly what coverage is missing.

I think that in a web, the physical location of the documents are not important - it is the relations that matter. Relative URLs are legal, and I expect the consumer of the HTML report is intended to be a human using a web browser; the machines should consume the scoverage or cobertura reports.

I agree it might be better to treat all files as relative to the root directory. In the past I've struggled with .NET symbols where they are linked to absolute files; this then required post-processing since we wanted to share the symbol files produced by our build server, and every developer had a different work area.

gslowikowski commented 9 years ago

I disagree.

@sksamuel didn't implement it this way in SBT plugin: https://github.com/scoverage/sbt-scoverage/blob/master/src/main/scala/scoverage/ScoverageSbtPlugin.scala#L77 https://github.com/scoverage/sbt-scoverage/blob/master/src/main/scala/scoverage/ScoverageSbtPlugin.scala#L173 https://github.com/scoverage/sbt-scoverage/blob/master/src/main/scala/scoverage/ScoverageSbtPlugin.scala#L202 Unfortunately this SBT plugin implemetation does not work :(

Generally speaking, if you have to make such tricks, something is not properly designed or implemented. I would prefer to fix/redesing scalac plugin logic to avoid tricks in SBT/Maven/Gradle plugins.

Additionally I investigated if it's possible to check coverage when tests are in different module than code. There are at least two related issues: https://github.com/scoverage/scalac-scoverage-plugin/issues/79 https://github.com/scoverage/scoverage-maven-plugin/issues/5

This could sometimes (if main and test modules are not in the same multimodule project) require publishing coverage data files (there is an option in Cobertura Maven plugin http://mojo.codehaus.org/cobertura-maven-plugin/instrument-mojo.html#attach, so there must be a use case for this). I would like to avoid publishing files with my full paths inside them.

maiflai commented 9 years ago

With regard to https://github.com/scoverage/scoverage-maven-plugin/issues/5; isn't this a request for scoverage to measure Java code? I don't think that's relevant here.

With regard to https://github.com/scoverage/scalac-scoverage-plugin/issues/79, what exactly is the issue please?

My understanding of the code suggests that coverage is measured whenever the statement is executed; this is because the data directory is determined at compile-time. Therefore the measurements of code in projectA will be recorded whenever it is executed, whether by projectA-tests or by projectB-tests.

However, the conversion of those measurements into scoverage.xml or other reporting formats typically occurs only in projectA, and aggregation takes place from the created scoverage.xml files.

If you want to report on projectA from projectA-tests then you would change the data directory in projectA-tests (which I think can be easily done from the Gradle plugin).

Personally though, I would like to support isolating the measurements such that my integration tests don't pollute the unit test coverage, and this presents a different problem.

gslowikowski commented 9 years ago

https://github.com/scoverage/scoverage-maven-plugin/issues/5 is about testing and calculating coverage of package OSGI boundles in test environment mocking real OSGI container. In this case you need to build jar file with instrumented classes and test it / measure coverage in other module.

Could you show, how would you configure Gradle plugin for multimodule project with tests in separate modules? Perhaps it would be better to comment in https://github.com/scoverage/scalac-scoverage-plugin/issues/79 issue.

maiflai commented 9 years ago

https://github.com/scoverage/scoverage-maven-plugin/issues/5 - jar'ing the instrumented classes is trivial, running them equally so, but if you want to isolate coverage to a particular test run then you need to manually manage the data directory.

re: the data directory, the exact gradle implementation will depend on whether you have a single gradle file, and just how many projects you have.

project(':a').extensions.scoverage.dataDir = project(':b').extensions.scoverage.dataDir

or in a more generic form:

configure(subprojects.findAll { it.name.endsWith('-tests') }) {
   scoverage {
    dataDir = project(":${it.name.minus('-tests')}").extensions.scoverage.dataDir
  }
}

maiflai commented 9 years ago

I agree that the instrumented classes are not transportable between machines, but I question how useful that feature is?

gslowikowski commented 9 years ago

Read this http://mojo.codehaus.org/cobertura-maven-plugin/instrumentingDeploymentArtifact.html and https://github.com/scoverage/scoverage-maven-plugin/issues/5. This makes sense to me.

maiflai commented 9 years ago

Looks interesting - I think there are a few key things from that?

To support multi-module:

we need to be able to uniquely identify statements (I think this is currently restricted because statements are indexed from 1 in each instrumentation session, so multiple instrumented jars cannot be distinguished)

To support running previously instrumented code on a different machine:

we need to publish all the instrumented things together (i.e. the measured statements file should be published with the instrumented classes).
we need to be able to link back to the source files (cobertura achieve this with relative paths and the convention of publishing a -sources.jar). We could include the source files in the instrumented jar.
we need to specify where the measurements should be written to at runtime

maiflai commented 9 years ago

I've added an experimental task in v.1.0.8 which you might add to your root project? Note that the root project must also have the scoverage plugin applied and configured.

task aggregateScoverage(type: org.scoverage.ScoverageAggregate)

--edit, fixed typo aggregage -> aggregate

It does not declare any dependencies at the moment, but the implementation requires that all the subproject reportScoverage tasks have run first.

I have found this to work for me:

gradle clean reportScoverage aggregateScoverage

Please could you let me know if this works for you in its current form?

Kwestor commented 9 years ago

Thanks, I'll check that.

Kwestor commented 9 years ago

Excuse my lack of gradle knowledge - should I place line you provided in my parent's gradle configuration file to enable this task there?

EDIT: ok, I should. I did not work for me before cause of typo: "aggregageScoverage" :smile:

maiflai commented 9 years ago

:-)

As a side note, I've found that enabling parallel execution without defining task dependencies will cause this new task to fail. I need to investigate how to add a dependency on all reportScoverage tasks in the build.

Kwestor commented 9 years ago

I think you could probably set input of aggregate as a collection of outputs of reports from all sub-projects.

maiflai commented 9 years ago

Thanks, I've updated above with aggregateScoverage

I think that for a simple aggregation of all subprojects you can do the following:

allprojects {
  repositories {
    mavenCentral()
  }

  apply plugin: 'scala'
  apply plugin: 'scoverage'

  dependencies {
    compile 'org.scala-lang:scala-library:2.11.4'
    scoverage 'org.scoverage:scalac-scoverage-plugin_2.11:1.0.4',
    'org.scoverage:scalac-scoverage-runtime_2.11:1.0.4'
  }

}

// declare the aggregate task at the root level
task aggregateScoverage(type: org.scoverage.ScoverageAggregate)

// since the reportScoverage task has been configured across subprojects, we can require that aggregateScoverage depends on all of them
getTasksByName('reportScoverage', true).each {
  aggregateScoverage.dependsOn(it)
}

I wonder how best to achieve a sensible default behaviour here - I think it's generally appropriate to have a single aggregate task at the root level, but perhaps others with multiple levels of subproject will want to aggregate coverage lower down?

Kwestor commented 9 years ago

We have structure like this:

\root
  \meta1
    \project1
      \subproject1
      \subproject1-tests
      ...
    \project2
      \subproject1
      \subproject1-tests
      ...
  \meta2
  ...

And right now I work on aggregation on project level, just above leaves.

scoverage / gradle-scoverage

Support for aggregate task #33