Closed graaff closed 14 years ago
Hi, First of all, thanks for trying Reek, and for reporting your problem.
My original concept for Reek was that it might "someday" be able to divine stuff across source file boundaries (eg. looking up and down an inheritance hierarchy). But that never came to be, and since the early days I've been focussing much more on code quality as a localised property.
All of which means -- you're right, Reek doesn't need to keep information about a source file after it's done. I'll fix this as soon as I have the YAML reporting stable.
Great. BTW, I was wrong to report 3.2Gb, that's just the resident set size. The actual process size is 5.5Gb. Still running and obviously swapping like crazy :-)
Code LOC: 10715
I'd like to see for myself -- can I get hold of a clone of your project folder?
Nope, sorry, not an open source project.
I can understand why you want to look at it. Just ran reek on http://github.com/fdv/typo and that took ~450Mb for Code LOC: 6547.
Running the same thing on our own project quickly shoots to a large size again. Any ideas on tracking this down further?
Culprit found. Our application_helper.rb is causing these issues. It's a 49Kb 1139 line file with 73 methods in it. Given that the remainder of the code is actually larger I suspect something in that file is causing the problems.
I know Reek runs fine on other large codebases, and 10KLOC isn't huge. So it does sound like more than just large code causing this. For example, Reek runs just fine on optparse.rb from the standard Ruby distribution, which is a bigger file than yours. So does
reek application_helper.rb
on its own show the problem? How long does it take to complete?
My first thought would be to take that file and slowly cut chunks out of it until the problem goes away, then attempt to create a small file that shows the same symptoms. If you can't send me the source, can you do some binary chop to get to something smaller that has the bug?
Cutting chunks was also my idea, so I played around a bit with it, but something strange is going on here. Cutting some methods made for a dramatic reduction in memory size (half), but running reek on that code by its own (in the same module) runs fast.
I've played around with it a bit more, and found that adding a single small method would double total running time and memory, but that is not related to the method itself, since that runs quickly when tested by its own.
Weird. And all this when passing just one file to Reek?
I wonder if it's related to the smells found in the various chunks? For example, did Reek report any smells in the small method?
No, no smells, and in fact running reek just on that method gives 0 warnings.
Ok, I'm stumped. Is there any way you could send me a copy of the offending file? Say with the names obfuscated? (And thanks for your patience with trying all this btw.)
Just upgraded to 1.2.7.1 and the problem is still present. I'll see if I can get something that I can send to you.
I know there are some circular references in Reek's abstract syntax trees -- maybe they hang around too long because they can't be garbage collected. That's something I'm gradually refactoring away in the next few releases. It would be great to get a repeatable concrete example though.
OK I think I can help. At least with a example. We just found a project on Caliper that is taking 25+ minutes. Turns out it is using up all the memory and using swap space while running reek. The machine has 1.7 GB of memory.
Luckily it is an open source project: http://github.com/edavis10/redmine
If you are trying to track a particular file causing the issue, This might help, we worked on the project up until this commit: e5d300af0a4d6703b043c05d4178353e7e252bf2
As seen on the history page for the project:
http://getcaliper.com/caliper/history?repo=git://github.com/edavis10/redmine.git
Here is a dump of the running process as it starts stalling. Let me know if any more info would help.
1005 3976 5.8 92.7 2378060 1659256 ? D 21:35 1:36 /usr/bin/ruby1.8 /metric_fu_gems/bin/reek app/controllers/attachments_controller.rb app/controllers/my_controller.rb app/controllers/enumerations_controller.rb app/controllers/custom_fields_controller.rb app/controllers/settings_controller.rb app/controllers/documents_controller.rb app/controllers/members_controller.rb app/controllers/search_controller.rb app/controllers/issue_categories_controller.rb app/controllers/issue_relations_controller.rb app/controllers/journals_controller.rb app/controllers/issue_statuses_controller.rb app/controllers/repositories_controller.rb app/controllers/wikis_controller.rb app/controllers/application_controller.rb app/controllers/trackers_controller.rb app/controllers/sys_controller.rb app/controllers/versions_controller.rb app/controllers/projects_controller.rb app/controllers/mail_handler_controller.rb app/controllers/users_controller.rb app/controllers/workflows_controller.rb app/controllers/timelog_controller.rb app/controllers/messages_controller.rb app/controllers/news_controller.rb app/controllers/wiki_controller.rb app/controllers/issues_controller.rb app/controllers/watchers_controller.rb app/controllers/welcome_controller.rb app/controllers/auth_sources_controller.rb app/controllers/account_controller.rb app/controllers/boards_controller.rb app/controllers/admin_controller.rb app/controllers/queries_controller.rb app/controllers/groups_controller.rb app/controllers/roles_controller.rb app/controllers/reports_controller.rb app/models/enabled_module.rb app/models/wiki_content_observer.rb app/models/project.rb app/models/news_observer.rb app/models/custom_field.rb app/models/repository/mercurial.rb app/models/repository/git.rb app/models/repository/subversion.rb app/models/repository/filesystem.rb app/models/repository/cvs.rb app/models/repository/darcs.rb app/models/repository/bazaar.rb app/models/attachment.rb app/models/document_category.rb app/models/tracker.rb app/models/group.rb app/models/query.rb app/models/changeset.rb app/models/time_entry_custom_field.rb app/models/principal.rb app/models/issue_observer.rb app/models/user_preference.rb app/models/group_custom_field.rb app/models/issue_status.rb app/models/board.rb app/models/document.rb app/models/auth_source_ldap.rb app/models/user.rb app/models/document_observer.rb app/models/auth_source.rb app/models/issue_priority.rb app/models/comment.rb app/models/message.rb app/models/wiki_page.rb app/models/journal_detail.rb app/models/change.rb app/models/issue_category.rb app/models/member.rb app/models/news.rb app/models/version.rb app/models/member_role.rb app/models/wiki.rb app/models/document_category_custom_field.rb app/models/issue_custom_field.rb app/models/message_observer.rb app/models/journal.rb app/models/time_entry_activity.rb app/models/mailer.rb app/models/issue_relation.rb app/models/wiki_redirect.rb app/models/repository.rb app/models/user_custom_field.rb app/models/mail_handler.rb app/models/time_entry_activity_custom_field.rb app/models/watcher.rb app/models/enumeration.rb app/models/workflow.rb app/models/wiki_content.rb app/models/version_custom_field.rb app/models/token.rb app/models/time_entry.rb app/models/issue_priority_custom_field.rb app/models/role.rb app/models/project_custom_field.rb app/models/setting.rb app/models/custom_value.rb app/models/journal_observer.rb app/models/issue.rb app/helpers/timelog_helper.rb app/helpers/messages_helper.rb app/helpers/users_helper.rb app/helpers/account_helper.rb app/helpers/workflows_helper.rb app/helpers/application_helper.rb app/helpers/versions_helper.rb app/helpers/members_helper.rb app/helpers/roles_helper.rb app/helpers/auth_sources_helper.rb app/helpers/admin_helper.rb app/helpers/custom_fields_helper.rb app/helpers/repositories_helper.rb app/helpers/welcome_helper.rb app/helpers/mail_handler_helper.rb app/helpers/queries_helper.rb app/helpers/journals_helper.rb app/helpers/enumerations_helper.rb app/helpers/projects_helper.rb app/helpers/issue_categories_helper.rb app/helpers/reports_helper.rb app/helpers/wiki_helper.rb app/helpers/groups_helper.rb app/h
Got some feedback from Edavis10 since he runs reek locally, here is his info... We were running Reek 1.2.7.1
edavis10: @devver reek 1.1.3 on Linux (x86) edavis10: @devver ok I might try that. Reek just ran fine here and didn't hit 100MB.
Thanks Dan -- I think Reek just needs to ditch each file's syntax tree after it's done analysing. Version 1.2.7.1 is much closer to being able to support that, so I may be able to get it in the next release. After my vacation next week :)
Cool good to know, have a great vacation. If you want any help debugging or testing the fix let me know.
Ok, I just got this on my laptop:
$ git clone git://github.com/edavis10/redmine.git
...
$ cd redmine/
$ ls -F
app/ db/ extra/ lib/ public/ README.rdoc test/ vendor/
config/ doc/ files/ log/ Rakefile script/ tmp/
$ reek -v
reek 1.2.7.1
$ time reek app
Error: failed to allocate memory
real 3m2.885s
user 2m38.066s
sys 0m16.481s
$
My wife will kill me, but it's something to do while I'm on holiday :)
In fact, it turns out that...
$ reek redmine/app/helpers/application_helper.rb
Error: failed to allocate memory
Nice glad you isolated it... Don't work to hard on your vacation. If you find that specific files with problems cause it to error out, perhaps it can short circuit in some way and just skip the file and log a warning. Or if what makes it freak out is weird code in the file make a smell out of it.
Interesting, in both the projects that I have trouble with the culprit is also the application_helper.rb file. In each case the remainder of the project is checked quickly.
This looks like a problem with the 1.2.7.1 version. I just upgraded and get the same error. 1.1.3 works fine.
This is a bug in the DataClump smell detector. Specifically, the algorithm that searches for common parameter sets constructs a power-set of the candidate methods in a class or module; and if there are more than a few methods having 2+ parameters that can get pretty large.
Working on a fix right now...
Fixed in release 1.2.7.2
Awesome deployed the fix on Caliper.
I've just upgraded to 1.2.7.2 and can confirm that it also fixes things for me. Thanks!
Unfortunately I think there is still and issue here... I had a user notify us of a private repo that was failing all the time. I tracked it down to reek stalling out over a single file and memory getting huge. Adding a reek configuration to disable dataclumping for that file fixed the issue. (at the containing folder, allowing dataclump to work on the rest of the project)
DataClump: enabled: false
Unfortunately, since this was a private repo, I can't share the code or the repository. I can say it was a file with multiple classes in it, and the file was 1500 lines of code long. So a pretty massive beast. Kevin, I know that doesn't give you much to go on, but if you have any ideas, I would love to hear them.
Another thought I had was perhaps a timeout around detection, so if it can't detect on a file it outputs an error log, and skips that file.
If you are really interested in knowing more, I could contact the user and see if they could work with us to get some open code that repos the problem.
To my knowledge this is the only project I have found still having an issue.
OK found a couple open source projects that trigger this issue.
http://github.com/therabidbanana/orange
clone the project and then run reek against this file: reek lib/orange-core/resources/model_resource.rb
DataClump: enabled: false
run reek against the file again and it completes in about 1 second.
We found this repos on:
http://github.com/mmangino/facebooker http://github.com/Shopify/active_merchant
I don't have the bug down to a single file in either of those projects though, so they aren't as useful for debugging. We are currently disabling DataClump site wide, but I would be happy to help debug this issue. Perhaps we could pair over Skype and look into it together sometime when you have some time.
I hope this helps give better info about the issue.
Thanks, Dan
Thanks Dan. The problem is that the DataClump detector creates a powerset of the methods in a class/module, and for 20 methods that's a million sets of methods. My first fix above was to prune out many of the possibilities first, thereby reducing the likelihood that the powerset would be large. The next step will be to replace the powerset by an iterator that runs over the same set, but pruning as it goes, without ever creating the powerset itself. Gonna try that on the train up to the Scottish Ruby Conf tomorrow, pairing with Brian Marick...
An alternative config trick would be to put the following it your site.reek file:
---
DataClump:
enabled: true
max_copies: 2
min_clump_size: 3
This fixes the problem for the above orange repo and yet still reports the biggest of its DataClumps. (Setting min_clump_size
to 3 forces more aggressive pruning before attempting to build the powerset, which is therefore a couple of orders of magnitude smaller.)
cool that isn't a bad suggestion. That might be a better site wide change than disabling this smell, we could just limit the clump size. Thanks for the suggestion.
Fixed (again) in release 1.2.7.3
I'm currently trying out reek as part of metric_fu. metric_fu runs reek on all the files in my Rails project in one go, e.g. "reek app/models/a.rb app/models/b.rb ...". This results in a huge ruby process (3.2Gb in my case). While I can appreciate the answer to just buy more RAM, I am wondering if it makes sense for the process to be this large and whether there are opportunities to diminish the memory requirements.