misshie / bioruby-ucsc-api

Ruby UCSC API: An API for the UCSC Genome Database
MIT License
19 stars 7 forks source link

NoMemoryError: failed to allocate memory #5

Closed spheregenomics closed 10 years ago

spheregenomics commented 10 years ago

I have been using this gem for some time with no problems. Suddenly it cannot allocate memory. I tried upgrading to 0.6.1 but it did not make a difference.

Rails 4.0.2 ruby 1.9.3p327 (2012-11-10 revision 37606) [i686-linux]

1.9.1 :003 > seqfile = Ucsc::File::Twobit.open("/home/assay/apps/assay/shared/bin/hg19/hg19.2bit")
NoMemoryError: failed to allocate memory
    from /home/assay/apps/assay/shared/bundle/ruby/1.9.1/gems/bio-ucsc-api-0.6.1/lib/bio-ucsc/file/twobit.rb:37:in `read'
    from /home/assay/apps/assay/shared/bundle/ruby/1.9.1/gems/bio-ucsc-api-0.6.1/lib/bio-ucsc/file/twobit.rb:37:in `block in load'
    from /home/assay/apps/assay/shared/bundle/ruby/1.9.1/gems/bio-ucsc-api-0.6.1/lib/bio-ucsc/file/twobit.rb:37:in `open'
    from /home/assay/apps/assay/shared/bundle/ruby/1.9.1/gems/bio-ucsc-api-0.6.1/lib/bio-ucsc/file/twobit.rb:37:in `load'
    from /home/assay/apps/assay/shared/bundle/ruby/1.9.1/gems/bio-ucsc-api-0.6.1/lib/bio-ucsc/file/twobit.rb:59:in `open'
    from (irb):3
    from /home/assay/apps/assay/shared/bundle/ruby/1.9.1/gems/railties-4.0.2/lib/rails/commands/console.rb:90:in `start'
    from /home/assay/apps/assay/shared/bundle/ruby/1.9.1/gems/railties-4.0.2/lib/rails/commands/console.rb:9:in `start'
    from /home/assay/apps/assay/shared/bundle/ruby/1.9.1/gems/railties-4.0.2/lib/rails/commands.rb:62:in `<top (required)>'
    from script/rails:6:in `require'
    from script/rails:6:in `<main>'
spheregenomics commented 10 years ago

Question opened on Stack Overflow http://stackoverflow.com/questions/22647826/ruby-memory-issues-with-kernel-open

misshie commented 10 years ago

Please could you try "jruby -J-Xmx3g your_script.rb" to keep 3G byte heap in Java virtual machine? Default heap size may not enough to load a 2-bit file. (see also the Implementation section of README.md).

spheregenomics commented 10 years ago

Thank you for your comment. I am running MRI ruby, not JRuby. I can try to install JRuby but this is a production machine and I don't want to make too many changes. It worked fine for the pat 9 months...

misshie commented 10 years ago

I am sorry for my misunderstanding. I saw discussion on StackOverflow. I don't know why I used Kernel.open like the following code:

two_bit = nil
Kernel.open(filename, 'rb') {|f| two_bit = f.read}

should have written like

two_bit = File.read(filename)

Could you try the following code on irb? irb > File.open('/home/assay/apps/assay/shared/bin/hg19/hg19.2bit'); nil ("; nil" is necessary to avoid printing whole the 2bit file)

If this works, I will update gem.

spheregenomics commented 10 years ago

I tried the command out in rails console... is that acceptable?

Loading production environment (Rails 4.0.2)
1.9.3p327 :001 > include Bio
 => Object
1.9.3p327 :002 > File.open('/home/assay/apps/assay/shared/bin/hg19/hg19.2bit'); nil
 => nil
misshie commented 10 years ago

Acceptable! Probably, Kernel.open has some troubles in some situations. I will update Ruby-UCSC-API gem sooner.

misshie commented 10 years ago

In previous version, I used Kernel.open because of name collision (Bio::Ucsc::File and File on the top level). I did not know an expression "::File.open" to indicate absolute path. Now, the final code is:

two_bit = nil
::File.open(filename, 'rb') {|f| two_bit = f.read}

I used block to close the file immediately,

[CORRECTION] You do not have to put "; nil" in irb. This problem was fixed in the version 0.5.0.


I released v0.6.2 on RubyGems. Hopefully, it will fix all the problems.

spheregenomics commented 10 years ago

Thank your for your efforts. I tried testing again in Rails console, and received a new error. There is 6GB RAM free on the server.

Loading production environment (Rails 4.0.2)
1.9.3p327 :001 > include Bio
 => Object
1.9.3p327 :002 > two_bit = nil
 => nil
1.9.3p327 :003 > ::File.open('/home/assay/apps/assay/shared/bin/hg19/hg19.2bit', 'rb') {|f| two_bit = f.read}
NoMemoryError: failed to allocate memory
    from (irb):3:in `read'
    from (irb):3:in `block in irb_binding'
    from (irb):3:in `open'
    from (irb):3
    from /home/assay/apps/assay/shared/bundle/ruby/1.9.1/gems/railties-4.0.2/lib/rails/commands/console.rb:90:in `start'
    from /home/assay/apps/assay/shared/bundle/ruby/1.9.1/gems/railties-4.0.2/lib/rails/commands/console.rb:9:in `start'
    from /home/assay/apps/assay/shared/bundle/ruby/1.9.1/gems/railties-4.0.2/lib/rails/commands.rb:62:in `<top (required)>'
    from script/rails:6:in `require'
    from script/rails:6:in `<main>'
misshie commented 10 years ago

Thank you for your new comment.

So far, I do not have good solution. In my Linux box with 8Gbyte RAM, the status of the 'spec/file/twobit.rb' spec is 'passed' using ruby-1.9.3-p125 and ruby-2.1.0. This spec uses UCSC's hg18.2bit and hg19.2bit.

I am watching the discussion at Stack Overflow and trying to find other web resources related to this issue.

spheregenomics commented 10 years ago

Hi misshie, I suspect there is some other problem in play here. I note that the code works fine on my laptop, and also prior to a Rails 4 upgrade worked fine for 9 months on the Ubuntu 12 production server. I think this gem is an excellent tool and will try to continue to work with it. One option is to build a clone of the server to determine what is causing this memory error.

I will first try updating ruby and then try a new server build. I will report my findings back to this page and to Stack Overflow. I did note the gem is reliant on the mysql gem, which apparently has been the cause of memory problems. I will also experiment with mysql2 and ruby-mysql. Regards, Sean

spheregenomics commented 10 years ago

I have managed to work around this issue by wrapping the bio-ucsc-api library in an external ruby program and running via a Open4 call. It works fine now. When run from within a rails stack it dies.

Workaround code below:

require 'bio-ucsc'
include Bio
# ARGV[0] is the full path to the 2bit file
# expects ARGV[1] to be this:
# subsequence = "#{chrom}:#{batch_detail.chrom_start - batch_detail.forward_offset}-#{batch_detail.chrom_end + batch_detail.reverse_offset}"

seqfile = Ucsc::File::Twobit.open("#{ARGV[0]}")
extracted_seq = seqfile.subseq("#{ARGV[1]}")
puts extracted_seq

This code is called within the Rails app by this:

def self.ucscapi(subsequence)
    filepath = "#{Rails.root}/bin/hg19/hg19.2bit"
    scriptdir = "#{Rails.root}/bin/ucscapi"
    Dir.chdir(scriptdir)
    cmd = "ruby ucscapi.rb #{filepath} #{subsequence}"
    sequence = ''
    ucsc_status = Open4::popen4("sh") do |pid, stdin, stdout, stderr|
      stdin.puts "cd #{scriptdir}"
      stdin.puts "#{cmd}"  
      stdin.close
      sequence = stdout.read.strip
    end
    return sequence 
  end