ualbertalib / HydraNorth

This repo is deprecated. Succeeded by https://github.com/ualbertalib/jupiter. This codebase was a IR built based on Samvera/Sufia
11 stars 4 forks source link

Heap Memory Analyze #1236

Closed weiweishi closed 8 years ago

weiweishi commented 8 years ago

This is to analyze the heap memory usage and try to trace down where the memory leak might be. We will be implementing the twice a week restart on ERA to accommodate the September traffic hike.

mbarnett commented 8 years ago

I'd like to verify that we're using all of the tools we have at hand to analyze our memory usage in Production. In particular, that we have the GC logging enabled as detailed here: http://stackoverflow.com/questions/15307998/is-it-common-practice-to-turn-on-gc-logging-in-production-java-server. We should be able to gather this information with no performance loss in Production.

Important JVM flags we will want to ensure are being used in Production are: -XX:+PrintGCDetails,-XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -Xloggc:

Henry has indicated that he's seen some of these log files in Production, but they are empty. We should diagnose the issue preventing this important information from being logged ASAP

pgwillia commented 8 years ago

Looks like these are set in @henryzhang87's Ansible playbook:

./hydra/roles/fedora/templates/setenv.sh.j2:JAVA_OPTS="$JAVA_OPTS -verbose:gc -XX:HeapDumpPath=/usr/share/tomcat7/logs/fedoar_heap.hprof -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/tomcat7/gc.log"

The log is being populated! https://logger1.library.ualberta.ca/logs/fedora/mycombe/gc.log

mbarnett commented 8 years ago

huzzah 🎉

pgwillia commented 8 years ago

A couple reports created by the Eclipse Memory Analyzer

heapdump-5G-1472476259732_Leak_Suspects.zip heapdump-5G-1472476259732_Top_Components.zip

pgwillia commented 8 years ago

@mbarnett found this. It is likely responsible for the empty instances of java.util.LinkedHashMap and has been fixed for Fedora 4.6.

pgwillia commented 8 years ago

Just to summarize what's been observed so far:

Ways to work around:

Things to still investigate:

henryzhang87 commented 8 years ago

Based on the aboveand to avoid the buildup of heap memory usage, mycombe is schedule to reboot twice a week. It used to be once a week on Wednesday. It is scheduled to reboot at 5:00AM on Monday and Friday now. This will help to reduce the probability of sending heap usage warnings by Nagios to oncall device during the weekend.

On Wed, Aug 31, 2016 at 10:14 AM, pgwillia notifications@github.com wrote:

Just to summarize what's been observed so far:

  • One servlet is taking up 97% of used memory -- makes sense, this is Fedora
  • Many empty collections (java.util.LinkedHashMap) hanging around is a know problem and was fixed in Fedora 4.6

Ways to work around:

  • restart tomcat on a regular basis (i.e. weekly)
  • upgrade to Fedora 4.6

Things to still investigate:

  • Duplicate Strings - 49,993 × org.fcrepo.http.commons.domain.SinglePrefer (104 bytes)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ualbertalib/HydraNorth/issues/1236#issuecomment-243816478, or mute the thread https://github.com/notifications/unsubscribe-auth/ADFBLHVu_EPTDayFqfUHefqrnQfx5mjMks5qlahdgaJpZM4Jki62 .

Henry Zhang Sr. Sys. admin and Storage Specialist

pgwillia commented 8 years ago

The memory leak fix is directly reflected in changes to org.fcrepo.http.commons.domain.SinglePrefer.

I think we can close this.