nedbat / coveragepy

The code coverage tool for Python
https://coverage.readthedocs.io
Apache License 2.0
2.96k stars 428 forks source link

Incomplete file path in XML report #578

Open nedbat opened 7 years ago

nedbat commented 7 years ago

Originally reported by Anonymous


Hi there, we are relying on the XML coverage report generated in coverage.xml for our analysis.

In previous versions of coverage, when we run the command such as:

$ py.test --cov directory_one --cov directory_two --cov-report xml tests

We would get a coverage.xml file with sources listed as:

<sources>
    <source>/path/to/directory_one</source>
    <source>/path/to/directory_two</source>
</sources>

And each file would be reported in a line such as this:

<class branch-rate="0" complexity="0" filename="directory_one/__init__.py" line-rate="1" name="__init__.py">

Now in coverage 4.4 the first level of the directory path is missing and we get only:

<class branch-rate="0" complexity="0" filename="__init__.py" line-rate="1" name="__init__.py">

Is there any way to get the previous behaviour?


nedbat commented 6 years ago

Original comment by Suriya Subramanian (Bitbucket: suriya, GitHub: suriya)


Here's a public git repository demonstrating this bug: https://bitbucket.org/suriya/coverage-xml-bug

You can view the result of a run here: https://bitbucket.org/suriya/coverage-xml-bug/addon/pipelines/home#!/results/1

nedbat commented 6 years ago

@suriya Can you provide me with a reproducible case?

nedbat commented 6 years ago

Original comment by Suriya Subramanian (Bitbucket: suriya, GitHub: suriya)


I am using coverage.py version 4.3.4. I see that only one file with a particular base name at a particular depth is reported. For example, with three files a/__init__.py, b/__init__.py, and a/c/__init__.py, we get only two files in the output a/__init__.py and a/c/__init__.py. This is especially problematic with a Django project and we get a report for only one models.py file.

nedbat commented 7 years ago

Original comment by Alexandre Conrad (Bitbucket: aconrad, GitHub: aconrad)


Also, pycobertura is affected by this change too.

There should be a way to have a relative path to the source files when starting from the root of the package and we lost this information. Currently the <source> tag only provides absolute paths to the source on the machine it was generated on which isn't very useful if the coverage report gets interpreted on a different machine where the absolute path is unlikely to exist.

For example, if a machine receives a coverage report that was generated for package "foo" at commit "abc123" with a filename __init__.py (which was originally foo/__init__.py) then we should be able to git-clone the source of package foo@abc123 and reconcile the paths somehow. But with the change there is no way to do this anymore.

We're missing the information about the directory foo.

nedbat commented 7 years ago

Original comment by Alex Gaynor (Bitbucket: alex_gaynor, GitHub: Unknown)


I'm not particularly convinced this change is correct, but looking at: https://bitbucket.org/ned/coveragepy/src/63dfe482a849e9cb0b82214242315a806a6a3d53/coverage/xmlreport.py?at=default&fileviewer=file-view-default#xmlreport.py-143:151

Our challenge is that we have two paths in self.source_paths: /path/to/cryptography and /path/to/cryptographt/tests. Since it's a set, depending on your version of Python, ordering may not be stable, if /path/to/cryptography/tests comes first, rel_name ends up as test_x509.py, if /path/to/cryptography comes first you get rel_name = "tests/test_x509.py".

The fix, which I'm not particularly confident in, is to change line 146 to be: for source_path in sorted(self.source_paths, key=len):.

This will ensure we always get /path/to/cryptography/ first here. Tested locally and it works, but I have no sense of whether this is conceptually correct :-)

nedbat commented 7 years ago

Original comment by Alex Gaynor (Bitbucket: alex_gaynor, GitHub: Unknown)


This is hitting us on pyca/cryptography. Can be reproduced with:

$ git clone https://github.com/pyca/cryptography
$ cd cryptography
$ tox -e py34 -- tests/test_x509.py
$ coverage xml -i

As you can see in the tox output's coverage report, there paths in our test directory have a test/ prefix, but after running coverage xml, in coverage.xml, files from the test/ directory do not. An easy file to see this with is test_x509.py.

nedbat commented 7 years ago

Original comment by Dirk Thomas (Bitbucket: dirk-thomas, GitHub: dirk-thomas)


In order to avoid the ambiguity (while still keep the information of the sources basepath and the filename within that basepath separate) could the class tag get a new attribute source to clearly identify the base path the referenced file was being found? Obviously that would require downstream tools to be updated too but I don't see a good solution which works in all cases based on the information present in the current xml.

nedbat commented 7 years ago

Original comment by David Multer (Bitbucket: dmulter, GitHub: dmulter)


FYI, code2coverage has put in a fix and things are working for me now. Thanks!

nedbat commented 7 years ago

Original comment by Bence Nagy (Bitbucket: underyx, GitHub: underyx)


FWIW the SonarQube Python Plugin's coverage importer is also utterly confused by these changes.

nedbat commented 7 years ago

Original comment by David Multer (Bitbucket: dmulter, GitHub: dmulter)


I've confirmed that files in different modules with the same name (i.e. init.py) never did show in my coverage output and I just failed to notice that I had no data one those files. Getting that fixed is obviously important, but less so than making sure my XML processing tools are able to parse the output properly.

nedbat commented 7 years ago

Original comment by David Multer (Bitbucket: dmulter, GitHub: dmulter)


Yes, I see that though I never noticed the problem before. I'll have to think a bit to understand why. So fixing that problem would probably be important, but if it still existed and the code went back I would think all was well ;-)

It's the second problem that I felt was the real issue. I'm waiting to see what the coverage2clover person says...

nedbat commented 7 years ago

Just so we're clear: my point is about the old behavior in coverage.py 4.3.4: it didn't include all of the files in the XML report if the files had same names in different source trees. So just putting back the old behavior won't be enough. Maybe we agree on this, I'm not sure?

nedbat commented 7 years ago

Original comment by David Multer (Bitbucket: dmulter, GitHub: dmulter)


Yes, coverage1.xml shows the single init.py file problem, and the coverage2.xml files shows just the filespec problem tools will have knowing where foo.py or bar.py are. I have the coverage2clover maintainer looking at these too.

nedbat commented 7 years ago

Yes, thanks. When I run this code with coverage 4.3.4, two/__init__.py is in the coverage1.xml file, but one/__init__.py is not. Even if I add an executable line to each of those files, only one of them appears in the results. There's more to be fixed here.

nedbat commented 7 years ago

Original comment by David Multer (Bitbucket: dmulter, GitHub: dmulter)


sorry, pull again

nedbat commented 7 years ago

Also, the __init__.py files are still empty?

nedbat commented 7 years ago

Thanks, but it looks like .coveragerc_noinitfiles is missing.

nedbat commented 7 years ago

Original comment by David Multer (Bitbucket: dmulter, GitHub: dmulter)


Please check the repo again for a second variation of this bug.

nedbat commented 7 years ago

Original comment by David Multer (Bitbucket: dmulter, GitHub: dmulter)


No, I think that's a side effect of not having any actual code to test in my example. You could easily just add a do-nothing function in each to get file output for both. You'll see the same problem. I think I'll go add it to the repo now...

nedbat commented 7 years ago

@dmulter thanks for that. I'm trying to understand the old behavior you want back. When I run your code under coverage==4.3.4, only two/__init__.py is in the XML output. one is missing completely. So the old code had some problems with this multiple-source scenario also. Is that true of your real code also?

nedbat commented 7 years ago

Original comment by David Multer (Bitbucket: dmulter, GitHub: dmulter)


I also created an issue for this in the coverage2clover repo - https://github.com/tumb1er/coverage2clover/issues/10

nedbat commented 7 years ago

Original comment by David Multer (Bitbucket: dmulter, GitHub: dmulter)


https://github.com/dmulter/coverage2clover_bug

nedbat commented 7 years ago

I want to make tools work, but I want all of them to work. I'm not going to simply revert the change. I'll make an improvement that will help everyone. It would be a big help to me if someone could provide a reproducible case of a multi-source result. Even better would be to advocate with them to make use of all of the information in the coverage.xml file.

nedbat commented 7 years ago

Original comment by Benjamin Morgan (Bitbucket: bj_morgan, GitHub: Unknown)


This change from 4.3.4 → 4.4 has also broken compatibility with the codeclimate test reporter. See https://github.com/codeclimate/python-test-reporter/issues/40

nedbat commented 7 years ago

@dmulter @postrational Can you provide me with a reproducible example, even if it's a link to an online repo with instructions of how to run the tests? I want to get this to work again for you. It's clear to me that combining the source path and the filename should be a real file path, so I think the fix was the correct one, but clearly I am not done since your use cases are broken.

nedbat commented 7 years ago

Original comment by David Multer (Bitbucket: dmulter, GitHub: dmulter)


No idea on how source paths is used, but the coverage2clover tool seems to just do an open() on the filename= value, so it must go back to the way it was previously for it to work again. Note that my coverage run produces multiple source paths as I use a shared repo for common code.

#!xml
    <sources>
        <source>/Users/xxx/Documents/xxx/app-common/apis</source>
        <source>/Users/xxx/Documents/xxx/app-foo</source>
        <source>/Users/xxx/Documents/xxx/app-foo/apis</source>
    </sources>

and

#!xml

    <class branch-rate="0" complexity="0" filename="apis/__init__.py" line-rate="1" name="__init__.py">
nedbat commented 7 years ago

Yes, I see the problem, sorry. What is the correct behavior here? This was to fix https://bitbucket.org/ned/coveragepy/issues/526/generated-xml-invalid-paths-for-cobertura. How are the source paths and the filenames supposed to be combined?

nedbat commented 7 years ago

Original comment by David Multer (Bitbucket: dmulter, GitHub: dmulter)


I'm seeing the same thing. Not only is the source file reference ambiguous, it has also broken my use of coverage2clover in my Bamboo builds as it can't find the source file.

nedbat commented 7 years ago

Original comment by Michal Karzynski (Bitbucket: postrational, GitHub: postrational)


If we have 2 files like this:

Then they would both get a like like this:

<class branch-rate="0" complexity="0" filename="__init__.py" line-rate="1" name="__init__.py">

How can I tell which file this refers to?

nedbat commented 7 years ago

This was a bug that was fixed. The filename attribute should be combinable with a source entry to find the file. Can you tell me more about why you need the directory name in both places?

jacobwegner commented 6 years ago

@nedbat I ran across this issue and believe I've found a useful workaround.

https://bitbucket.org/suriya/coverage-xml-bug/pull-requests/1/demonstrate-a-fix-to-coverage-xml-bug/diff

Changing from specifying the modules in [sources] to directory paths in [include] ensures that the path to each file is output in coverage.xml:

using [source]:

<class branch-rate="0" complexity="0" filename="hello.py" line-rate="1" name="hello.py">
<class branch-rate="0" complexity="0" filename="baz/hello.py" line-rate="1" name="hello.py">

using [include]:

<class branch-rate="0" complexity="0" filename="bar/hello.py" line-rate="1" name="hello.py">
<class branch-rate="0" complexity="0" filename="bar/baz/hello.py" line-rate="1" name="hello.py">
<class branch-rate="0" complexity="0" filename="foo/hello.py" line-rate="1" name="hello.py">

I'm not sure how this might need to be worded in the project's documentation, but will try and take a stab at it if you think that'd be useful.

suriya commented 6 years ago

@jacobwegner Thank you for the workaround.

blueyed commented 5 years ago

Just for reference, this was changed in 80cf759 (to fix https://github.com/nedbat/coveragepy/issues/526).

What does the spec / XML format say in this regard? I could not find this quickly, but came across an example, where it contains the command being used even?! https://raw.githubusercontent.com/jenkinsci/cobertura-plugin/master/src/test/resources/hudson/plugins/cobertura/coverage-with-data.xml

After all it is impossible to find the correct file if there are multiple sources, and it causes problems all around. From #526 it seems like there was an issue with Jenkins Cobertura plugin, maybe only in combination with nose initially?

(I know that the workaround is to use a single source only, but it is rather unfortunate running into this still)

drazisil commented 5 years ago

I spent the weekend going crazy with pytest-cov and Codecov.

I can confirm that Codecov walks the filename= tags and without a path in them there is not way for it to match the coverage to a file.

FYI, the generated coverage.xml file says it uses https://raw.githubusercontent.com/cobertura/web/master/htdocs/xml/coverage-04.dtd, if that helps any, @blueyed (It's very bare bones)

aconrad commented 2 years ago

I'd like to bump this thread again since I'm still seeing the issue with coverage v6.

I posted about it in this other project https://github.com/aconrad/pycobertura/issues/133#issuecomment-1036466783 but I will copy-paste here for convenience:

What I am noticing is that in coverage v4 the class element has a "filename" attribute as such:

<class name="cli.py" filename="pycobertura/cli.py" ... >

In coverage v6, we now have the directory that is missing from the filename:

<class name="cli.py" filename="cli.py" ... >

And causes Pycobertura to fail with a FileNotFound:

File "pycobertura/filesystem.py", line 73, in open
    raise self.FileNotFound(filename)
FileNotFound: aconrad-pycobertura-193cfa7/__init__.py

The file should be aconrad-pycobertura-193cfa7/pycobertura/__init__.py (don't worry about the prefix directory, that's when we use pycobertura with a zip file and the --prefix flag).

I'm surprised this has never been fixed. I wonder how code coverage parsers do it. Do they compose the filename by taking the last segment of the path inside the <sources> element? In the report generated by coverage v6 we have source <source>/Users/alexandre/Development/pycobertura-gro1m/pycobertura</source> – but given that there could be multiple sources, it could be ambiguous.

Thoughts?