jendrikseipp / vulture

Find dead Python code
MIT License
3.42k stars 150 forks source link

Overriden methods of C/C++ extensions #8

Closed jendrikseipp closed 6 years ago

jendrikseipp commented 9 years ago

Originally reported by: Florian Bruhin (Bitbucket: The-Compiler, GitHub: The-Compiler)


When subclassing a method of a class defined in a C/C++ extension, such as this example using the PyQt GUI framework:

from PyQt5.QtCore import QSize
from PyQt5.QtWidgets import QLineEdit, QApplication

class LineEdit(QLineEdit):

    def sizeHint(self):
        return QSize(128, 128)

app = QApplication([])
le = LineEdit()
le.show()
app.exec_()

vulture doesn't detect that sizeHint is called via C++:

foo.py:7: Unused function 'sizeHint'

Maybe if a class inherits from a non-Python class, and an existing method is overridden, it should always be considered used?


jendrikseipp commented 8 years ago

Original comment by Jendrik Seipp (Bitbucket: jendrikseipp, GitHub: jendrikseipp):


No worries, we're not in a hurry :)

jendrikseipp commented 8 years ago

Original comment by Florian Bruhin (Bitbucket: The-Compiler, GitHub: The-Compiler):


Sorry for not coming back to this earlier - it seems a sip version with my patch to have a virtual attribute in the XML was released in the meantime, so I intend to pick this up again at some point, but I have a bit much on my plate right now.

jendrikseipp commented 9 years ago

Original comment by Jendrik Seipp (Bitbucket: jendrikseipp, GitHub: jendrikseipp):


I'd still prefer not to have the generated file in the repository. It is better to generate a version for a specific qt release. Also, anybody, who wants to make sure that they're using the latest version, will use the generator anyway.

jendrikseipp commented 9 years ago

Original comment by Florian Bruhin (Bitbucket: The-Compiler, GitHub: The-Compiler):


The file only would need updating when there's a new Qt release, which is about once all 6 months.

If we only include the generator with vulture, every downstream project would need to generate/commit/update their own whitelist file - IMHO it's clearly preferable to have this in a central location?

jendrikseipp commented 9 years ago

Original comment by Jendrik Seipp (Bitbucket: jendrikseipp, GitHub: jendrikseipp):


Nice work! Wouldn't it be better if we could specify the qt version on the command line? A function might be dead code in one version, but not in a different one. Can we find the list of FEATURES automatically?

I think it's best not to add generated files to repositories. Having only the generator is better, since then we don't have to update the generated file.

It would be great if you could prepare a pull request! I would propose to add the file under whitelists/pyqt/generator.py.

jendrikseipp commented 9 years ago

Original comment by Florian Bruhin (Bitbucket: The-Compiler, GitHub: The-Compiler):


This is all surprisingly easy \o/

This is the script I have currently:

import os                                                                                              [11/5194]
import argparse
import itertools
import subprocess
import tempfile

from lxml import etree

TIMELINE = ['Qt_5_0_0', 'Qt_5_0_1', 'Qt_5_0_2', 'Qt_5_1_0', 'Qt_5_1_1',
            'Qt_5_2_0', 'Qt_5_2_1', 'Qt_5_3_0', 'Qt_5_3_1', 'Qt_5_3_2',
            'Qt_5_4_0', 'Qt_5_4_1', 'Qt_5_4_2', 'Qt_5_5_0']
PLATFORMS = ['WS_X11', 'WS_WIN', 'WS_MACX']
FEATURES =['PyQt_Accessibility', 'PyQt_SessionManager', 'PyQt_SSL',
           'PyQt_qreal_double', 'Py_v3', 'PyQt_PrintDialog', 'PyQt_Printer',
           'PyQt_PrintPreviewWidget', 'PyQt_PrintPreviewDialog',
           'PyQt_RawFont', 'PyQt_OpenGL', 'PyQt_Desktop_OpenGL',
           'PyQt_NotBootstrapped']

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--sip', help='The sip binary to use', default='sip',
                        nargs=1)
    return parser.parse_args()

def run_sip(module, outdir, sip_executable):
    for exclusive_tags in itertools.product(TIMELINE, PLATFORMS):
        filename = '{}-{}.xml'.format(module, '-'.join(exclusive_tags))
        outfile = os.path.join(outdir, filename)
        cmdline = [sip_executable, '-m', outfile]
        for tag in list(exclusive_tags) + FEATURES:
            cmdline += ['-t', tag]
        cmdline.append(
            os.path.join(module, '{}mod.sip'.format(module)))
        print('  {} -> {}'.format(', '.join(exclusive_tags), outfile))
        subprocess.call(cmdline)

def parse_xmls(xmldir):
    for basename in os.listdir(xmldir):
        xmlfile = os.path.join(xmldir, basename)
        with open(xmlfile, 'r') as f:
            tree = etree.parse(f)
        yield from tree.xpath('/Module/Class/Function[@virtual="1"]/@name')

def get_modules():
    for filename in os.listdir('.'):
        filepath = os.path.abspath(filename)
        if os.path.isdir(filepath):
            yield filename

def main():
    args = parse_args()
    with open('whitelist.py', 'w') as outfile:
        for module in get_modules():
            with tempfile.TemporaryDirectory() as tmpdir:
                outfile.write('# {}\n'.format(module))
                name_set = set()
                print("Running sip for {}...".format(module))
                run_sip(module, tmpdir, args.sip[0])
                print("Parsing and merging XML files...")
                print()
                for name in parse_xmls(tmpdir):
                    name_set.add(name)
                for name in sorted(name_set):
                    outfile.write('{}.{}\n'.format(module, name))
                outfile.write('\n')
                outfile.flush()

if __name__ == '__main__':
    main()

After running for 1.5 hours (because it runs sip for all combinations of 14 Qt versions and 3 platforms) it spits out a file with around 2800 whitelist entries.

I'll still have to actually test it and clean everything up - after that, is it okay if I open a PR adding the whitelist and my script to the repo?

jendrikseipp commented 9 years ago

Original comment by Florian Bruhin (Bitbucket: The-Compiler, GitHub: The-Compiler):


I couldn't resist to take a quick look, but it looks easier than I thought!

sip has some (beta) XML export feature - I now submitted a patch so that XML includes a virtual="(1|0)" attribute for Function tag.

So the only piece left is a small python script which parses that XML and outputs all functions from there :)

jendrikseipp commented 9 years ago

Original comment by Jendrik Seipp (Bitbucket: jendrikseipp, GitHub: jendrikseipp):


Sounds interesting, please keep me posted.

jendrikseipp commented 9 years ago

Original comment by Florian Bruhin (Bitbucket: The-Compiler, GitHub: The-Compiler):


Overridden methods not being executed is a good point - with a whitelist, we could probably at least restrict that to virtual methods.

PyQt bindings code is generated from sip files, which is a special format - I wonder how hard it'd be to write a Python parser for those (possibly using the existing flex/bison grammar), and then generate a whitelist from that.

I'll take a look at it, but it might take me some weeks until I get time to do so.

jendrikseipp commented 9 years ago

Original comment by Jendrik Seipp (Bitbucket: jendrikseipp, GitHub: jendrikseipp):


Even if a C++ method is overriden, it might never get executed. Also, I don't think we can find out whether a method overrides a C++ method without executing and inspecting the code.

jendrikseipp commented 9 years ago

Original comment by Florian Bruhin (Bitbucket: The-Compiler, GitHub: The-Compiler):


PyQt5 has hundreds of classes, each with dozens of methods - and with every Qt release there will be new ones, so maintaining this will be a big effort, and the same issue will pop up for anyone who uses a C extension exposing classes...

Do you think there's an issue with the heuristic I presented above? I think it's safe to assume the programmer knows what they were doing if they're overriding an existing method of a C method. I also think something like that could be detected easily (trying if inspect.getargspec() works on the class, for example).

jendrikseipp commented 9 years ago

Original comment by Jendrik Seipp (Bitbucket: jendrikseipp, GitHub: jendrikseipp):


I think the best solution to this would be to create a PyQt whitelist module for vulture. Do you want to start such an effort? I would recommend adding a file whitelists/pyqt.py and making the usages explicit in there.

Maybe we can detect which whitelist files to include automatically later (e.g. include whitelist/pyqt.py if at least one pyqt module is imported).

RJ722 commented 6 years ago

I conducted a little experiment with the qutebrowser's run_vulture.py:

I changed the run function to disable filter_func (which previously whitelisted all the objects whose name was written in CamelCase) and added a vulture_whitelist (obtained by running the above script for PyQt-5.10.1). Here is how the function looked like:

def run(files):
    """Run vulture over the given files."""
    with tempfile.NamedTemporaryFile(mode='w', delete=False) as whitelist_file:
        for line in whitelist_generator():
            whitelist_file.write(line + '\n')

        whitelist_file.close()

        vult = vulture.Vulture(verbose=False)
        vult.scavenge(files + [whitelist_file.name, 'vulture_whitelist.py'])

        os.remove(whitelist_file.name)

    vulture_attrs = ['unused_funcs', 'unused_props', 'unused_vars', 'unused_attrs']

    items = []
    for attr in vulture_attrs:
        sub_items = getattr(vult, attr)
        items.extend(sub_items)

    return report(items)

After this, when I run vulture -e tox, the output was:

vulture installed: attrs==17.4.0,colorama==0.3.9,cssutils==1.0.2,Jinja2==2.10,MarkupSafe==1.0,Pygments==2.2.0,pyPEG2==2.15.2,PyQt5==5.10.1,PyYAML==3.12,qutebrowser==1.2.1,sip==4.19.8,vulture==0.26
vulture runtests: PYTHONHASHSEED='3339987915'
vulture runtests: commands[0] | /Users/rahuljha/Documents/qutebrowser/.tox/vulture/bin/python scripts/dev/run_vulture.py
qutebrowser/browser/network/pac.py:102: unused function 'dnsResolve' (60% confidence)
qutebrowser/browser/network/pac.py:120: unused function 'myIpAddress' (60% confidence)
qutebrowser/browser/webengine/interceptor.py:42: unused function 'interceptRequest' (60% confidence)
qutebrowser/browser/webengine/webenginequtescheme.py:38: unused function 'requestStarted' (60% confidence)
qutebrowser/resources.py:5471: unused function 'qCleanupResources' (60% confidence)
scripts/dev/run_vulture.py:129: unused function 'filter_func' (60% confidence)
ERROR: InvocationError: '/Users/rahuljha/Documents/qutebrowser/.tox/vulture/bin/python scripts/dev/run_vulture.py'
______________________________________________ summary ______________________________________________
ERROR:   vulture: commands failed

I couldn't find dnsResolve, myIpAddress, qCleanupResources in any of the sip files, so I guess those are just false positives.

But, a simple grep shows that interceptRequest and requestStarted indeed are virtual functions which our script isn't catching.

sip/QtWebEngineCore/qwebengineurlrequestinterceptor.sip:33:    virtual void interceptRequest(QWebEngineUrlRequestInfo &info) = 0;
sip/QtWebEngineCore/qwebengineurlschemehandler.sip:34:    virtual void requestStarted(QWebEngineUrlRequestJob *) = 0;

Isn't that an anomaly in the script?

jendrikseipp commented 6 years ago

I think you mean tox -e vulture?

Maybe the two virtual functions are not explicitly marked as virtual (I don't know how sip files work)?

@The-Compiler, can you help us out here?

The-Compiler commented 6 years ago

I agree about the false-positives. dnsResolve and myIpAddress are exposed to JavaScript via the @_js_slot decorator and vulture can't catch that. Dunno about qCleanupResources, that's some internal PyQt-thing probably, which is called from C++ but without being actually something from Qt.

I agree about interceptRequest and requestStarted - those should be virtual. I'm not sure whether it's a bug in the script, or a bug in sip somewhere - you can probably tell by reviewing the XML file which gets output by sip.

RJ722 commented 6 years ago

I think you mean tox -e vulture?

So sorry, my bad! :-p

I agree about interceptRequest and requestStarted - those should be virtual. I'm not sure whether it's a bug in the script, or a bug in sip somewhere - you can probably tell by reviewing the XML file which gets output by sip.

I am looking into it.

jendrikseipp commented 6 years ago

What's the status here? Can we link to your repo, Rahul and close this issue?

RJ722 commented 6 years ago

Hey Jendrik,

Ongoing action occurring at https://github.com/RJ722/vulture-whitelist-generators/tree/generate. I've an ongoing PR which is just one final review from @The-Compiler away from alpha release 1. 😛

jendrikseipp commented 6 years ago

Now that we have a repository for whitelist generators at https://github.com/RJ722/vulture-whitelist-generators, I think we can close this issue.