troessner / reek

Code smell detector for Ruby
https://github.com/troessner/reek
MIT License
4.02k stars 279 forks source link

Reek needs a lot of memory #57

Closed graaff closed 14 years ago

graaff commented 14 years ago

I'm currently trying out reek as part of metric_fu. metric_fu runs reek on all the files in my Rails project in one go, e.g. "reek app/models/a.rb app/models/b.rb ...". This results in a huge ruby process (3.2Gb in my case). While I can appreciate the answer to just buy more RAM, I am wondering if it makes sense for the process to be this large and whether there are opportunities to diminish the memory requirements.

kevinrutherford commented 14 years ago

Hi, First of all, thanks for trying Reek, and for reporting your problem.

My original concept for Reek was that it might "someday" be able to divine stuff across source file boundaries (eg. looking up and down an inheritance hierarchy). But that never came to be, and since the early days I've been focussing much more on code quality as a localised property.

All of which means -- you're right, Reek doesn't need to keep information about a source file after it's done. I'll fix this as soon as I have the YAML reporting stable.

graaff commented 14 years ago

Great. BTW, I was wrong to report 3.2Gb, that's just the resident set size. The actual process size is 5.5Gb. Still running and obviously swapping like crazy :-)

Code LOC: 10715

kevinrutherford commented 14 years ago

I'd like to see for myself -- can I get hold of a clone of your project folder?

graaff commented 14 years ago

Nope, sorry, not an open source project.

graaff commented 14 years ago

I can understand why you want to look at it. Just ran reek on http://github.com/fdv/typo and that took ~450Mb for Code LOC: 6547.

Running the same thing on our own project quickly shoots to a large size again. Any ideas on tracking this down further?

graaff commented 14 years ago

Culprit found. Our application_helper.rb is causing these issues. It's a 49Kb 1139 line file with 73 methods in it. Given that the remainder of the code is actually larger I suspect something in that file is causing the problems.

kevinrutherford commented 14 years ago

I know Reek runs fine on other large codebases, and 10KLOC isn't huge. So it does sound like more than just large code causing this. For example, Reek runs just fine on optparse.rb from the standard Ruby distribution, which is a bigger file than yours. So does

reek application_helper.rb

on its own show the problem? How long does it take to complete?

My first thought would be to take that file and slowly cut chunks out of it until the problem goes away, then attempt to create a small file that shows the same symptoms. If you can't send me the source, can you do some binary chop to get to something smaller that has the bug?

graaff commented 14 years ago

Cutting chunks was also my idea, so I played around a bit with it, but something strange is going on here. Cutting some methods made for a dramatic reduction in memory size (half), but running reek on that code by its own (in the same module) runs fast.

I've played around with it a bit more, and found that adding a single small method would double total running time and memory, but that is not related to the method itself, since that runs quickly when tested by its own.

kevinrutherford commented 14 years ago

Weird. And all this when passing just one file to Reek?

I wonder if it's related to the smells found in the various chunks? For example, did Reek report any smells in the small method?

graaff commented 14 years ago

No, no smells, and in fact running reek just on that method gives 0 warnings.

kevinrutherford commented 14 years ago

Ok, I'm stumped. Is there any way you could send me a copy of the offending file? Say with the names obfuscated? (And thanks for your patience with trying all this btw.)

graaff commented 14 years ago

Just upgraded to 1.2.7.1 and the problem is still present. I'll see if I can get something that I can send to you.

kevinrutherford commented 14 years ago

I know there are some circular references in Reek's abstract syntax trees -- maybe they hang around too long because they can't be garbage collected. That's something I'm gradually refactoring away in the next few releases. It would be great to get a repeatable concrete example though.

danmayer commented 14 years ago

OK I think I can help. At least with a example. We just found a project on Caliper that is taking 25+ minutes. Turns out it is using up all the memory and using swap space while running reek. The machine has 1.7 GB of memory.

Luckily it is an open source project: http://github.com/edavis10/redmine

If you are trying to track a particular file causing the issue, This might help, we worked on the project up until this commit: e5d300af0a4d6703b043c05d4178353e7e252bf2

As seen on the history page for the project:

http://getcaliper.com/caliper/history?repo=git://github.com/edavis10/redmine.git

Here is a dump of the running process as it starts stalling. Let me know if any more info would help.

1005 3976 5.8 92.7 2378060 1659256 ? D 21:35 1:36 /usr/bin/ruby1.8 /metric_fu_gems/bin/reek app/controllers/attachments_controller.rb app/controllers/my_controller.rb app/controllers/enumerations_controller.rb app/controllers/custom_fields_controller.rb app/controllers/settings_controller.rb app/controllers/documents_controller.rb app/controllers/members_controller.rb app/controllers/search_controller.rb app/controllers/issue_categories_controller.rb app/controllers/issue_relations_controller.rb app/controllers/journals_controller.rb app/controllers/issue_statuses_controller.rb app/controllers/repositories_controller.rb app/controllers/wikis_controller.rb app/controllers/application_controller.rb app/controllers/trackers_controller.rb app/controllers/sys_controller.rb app/controllers/versions_controller.rb app/controllers/projects_controller.rb app/controllers/mail_handler_controller.rb app/controllers/users_controller.rb app/controllers/workflows_controller.rb app/controllers/timelog_controller.rb app/controllers/messages_controller.rb app/controllers/news_controller.rb app/controllers/wiki_controller.rb app/controllers/issues_controller.rb app/controllers/watchers_controller.rb app/controllers/welcome_controller.rb app/controllers/auth_sources_controller.rb app/controllers/account_controller.rb app/controllers/boards_controller.rb app/controllers/admin_controller.rb app/controllers/queries_controller.rb app/controllers/groups_controller.rb app/controllers/roles_controller.rb app/controllers/reports_controller.rb app/models/enabled_module.rb app/models/wiki_content_observer.rb app/models/project.rb app/models/news_observer.rb app/models/custom_field.rb app/models/repository/mercurial.rb app/models/repository/git.rb app/models/repository/subversion.rb app/models/repository/filesystem.rb app/models/repository/cvs.rb app/models/repository/darcs.rb app/models/repository/bazaar.rb app/models/attachment.rb app/models/document_category.rb app/models/tracker.rb app/models/group.rb app/models/query.rb app/models/changeset.rb app/models/time_entry_custom_field.rb app/models/principal.rb app/models/issue_observer.rb app/models/user_preference.rb app/models/group_custom_field.rb app/models/issue_status.rb app/models/board.rb app/models/document.rb app/models/auth_source_ldap.rb app/models/user.rb app/models/document_observer.rb app/models/auth_source.rb app/models/issue_priority.rb app/models/comment.rb app/models/message.rb app/models/wiki_page.rb app/models/journal_detail.rb app/models/change.rb app/models/issue_category.rb app/models/member.rb app/models/news.rb app/models/version.rb app/models/member_role.rb app/models/wiki.rb app/models/document_category_custom_field.rb app/models/issue_custom_field.rb app/models/message_observer.rb app/models/journal.rb app/models/time_entry_activity.rb app/models/mailer.rb app/models/issue_relation.rb app/models/wiki_redirect.rb app/models/repository.rb app/models/user_custom_field.rb app/models/mail_handler.rb app/models/time_entry_activity_custom_field.rb app/models/watcher.rb app/models/enumeration.rb app/models/workflow.rb app/models/wiki_content.rb app/models/version_custom_field.rb app/models/token.rb app/models/time_entry.rb app/models/issue_priority_custom_field.rb app/models/role.rb app/models/project_custom_field.rb app/models/setting.rb app/models/custom_value.rb app/models/journal_observer.rb app/models/issue.rb app/helpers/timelog_helper.rb app/helpers/messages_helper.rb app/helpers/users_helper.rb app/helpers/account_helper.rb app/helpers/workflows_helper.rb app/helpers/application_helper.rb app/helpers/versions_helper.rb app/helpers/members_helper.rb app/helpers/roles_helper.rb app/helpers/auth_sources_helper.rb app/helpers/admin_helper.rb app/helpers/custom_fields_helper.rb app/helpers/repositories_helper.rb app/helpers/welcome_helper.rb app/helpers/mail_handler_helper.rb app/helpers/queries_helper.rb app/helpers/journals_helper.rb app/helpers/enumerations_helper.rb app/helpers/projects_helper.rb app/helpers/issue_categories_helper.rb app/helpers/reports_helper.rb app/helpers/wiki_helper.rb app/helpers/groups_helper.rb app/h

danmayer commented 14 years ago

Got some feedback from Edavis10 since he runs reek locally, here is his info... We were running Reek 1.2.7.1

edavis10: @devver reek 1.1.3 on Linux (x86) edavis10: @devver ok I might try that. Reek just ran fine here and didn't hit 100MB.

kevinrutherford commented 14 years ago

Thanks Dan -- I think Reek just needs to ditch each file's syntax tree after it's done analysing. Version 1.2.7.1 is much closer to being able to support that, so I may be able to get it in the next release. After my vacation next week :)

danmayer commented 14 years ago

Cool good to know, have a great vacation. If you want any help debugging or testing the fix let me know.

kevinrutherford commented 14 years ago

Ok, I just got this on my laptop:

$ git clone git://github.com/edavis10/redmine.git
...
$ cd redmine/
$ ls -F
app/     db/   extra/  lib/  public/   README.rdoc  test/  vendor/
config/  doc/  files/  log/  Rakefile  script/      tmp/
$ reek -v
reek 1.2.7.1
$ time reek app
Error: failed to allocate memory

real    3m2.885s
user    2m38.066s
sys 0m16.481s
$ 

My wife will kill me, but it's something to do while I'm on holiday :)

kevinrutherford commented 14 years ago

In fact, it turns out that...

$ reek redmine/app/helpers/application_helper.rb 
Error: failed to allocate memory
danmayer commented 14 years ago

Nice glad you isolated it... Don't work to hard on your vacation. If you find that specific files with problems cause it to error out, perhaps it can short circuit in some way and just skip the file and log a warning. Or if what makes it freak out is weird code in the file make a smell out of it.

graaff commented 14 years ago

Interesting, in both the projects that I have trouble with the culprit is also the application_helper.rb file. In each case the remainder of the project is checked quickly.

edavis10 commented 14 years ago

This looks like a problem with the 1.2.7.1 version. I just upgraded and get the same error. 1.1.3 works fine.

http://gist.github.com/319599

kevinrutherford commented 14 years ago

This is a bug in the DataClump smell detector. Specifically, the algorithm that searches for common parameter sets constructs a power-set of the candidate methods in a class or module; and if there are more than a few methods having 2+ parameters that can get pretty large.

Working on a fix right now...

kevinrutherford commented 14 years ago

Fixed in release 1.2.7.2

danmayer commented 14 years ago

Awesome deployed the fix on Caliper.

graaff commented 14 years ago

I've just upgraded to 1.2.7.2 and can confirm that it also fixes things for me. Thanks!

danmayer commented 14 years ago

Unfortunately I think there is still and issue here... I had a user notify us of a private repo that was failing all the time. I tracked it down to reek stalling out over a single file and memory getting huge. Adding a reek configuration to disable dataclumping for that file fixed the issue. (at the containing folder, allowing dataclump to work on the rest of the project)


DataClump: enabled: false

Unfortunately, since this was a private repo, I can't share the code or the repository. I can say it was a file with multiple classes in it, and the file was 1500 lines of code long. So a pretty massive beast. Kevin, I know that doesn't give you much to go on, but if you have any ideas, I would love to hear them.

Another thought I had was perhaps a timeout around detection, so if it can't detect on a file it outputs an error log, and skips that file.

If you are really interested in knowing more, I could contact the user and see if they could work with us to get some open code that repos the problem.

To my knowledge this is the only project I have found still having an issue.

danmayer commented 14 years ago

OK found a couple open source projects that trigger this issue.

http://github.com/therabidbanana/orange

clone the project and then run reek against this file: reek lib/orange-core/resources/model_resource.rb

This runs forever (or a really long time, maybe it finishes sometime). Now create a site.reek and add

DataClump: enabled: false

run reek against the file again and it completes in about 1 second.

We found this repos on:

http://github.com/mmangino/facebooker http://github.com/Shopify/active_merchant

I don't have the bug down to a single file in either of those projects though, so they aren't as useful for debugging. We are currently disabling DataClump site wide, but I would be happy to help debug this issue. Perhaps we could pair over Skype and look into it together sometime when you have some time.

I hope this helps give better info about the issue.

Thanks, Dan

kevinrutherford commented 14 years ago

Thanks Dan. The problem is that the DataClump detector creates a powerset of the methods in a class/module, and for 20 methods that's a million sets of methods. My first fix above was to prune out many of the possibilities first, thereby reducing the likelihood that the powerset would be large. The next step will be to replace the powerset by an iterator that runs over the same set, but pruning as it goes, without ever creating the powerset itself. Gonna try that on the train up to the Scottish Ruby Conf tomorrow, pairing with Brian Marick...

kevinrutherford commented 14 years ago

An alternative config trick would be to put the following it your site.reek file:

--- 
DataClump: 
  enabled: true
  max_copies: 2
  min_clump_size: 3

This fixes the problem for the above orange repo and yet still reports the biggest of its DataClumps. (Setting min_clump_size to 3 forces more aggressive pruning before attempting to build the powerset, which is therefore a couple of orders of magnitude smaller.)

danmayer commented 14 years ago

cool that isn't a bad suggestion. That might be a better site wide change than disabling this smell, we could just limit the clump size. Thanks for the suggestion.

kevinrutherford commented 14 years ago

Fixed (again) in release 1.2.7.3