matrix-profile-foundation / tsmp

R Functions implementing UCR Matrix Profile Algorithm
https://matrix-profile-foundation.github.io/tsmp
Other
70 stars 18 forks source link

Discovering motifs in 1M points blows up 24G memory on linux #38

Closed seninp closed 5 years ago

seninp commented 5 years ago

Describe the bug Following up the SCRIMP pub I'm trying to see structural features in the energy consumption series. Subsetting 8.xM points to 1M and running stomp with two threads ... task gets killed. What am I doing wrong?

To Reproduce Fresh Ubuntu 16.04 install, running the following code:

dd <- data.table::fread("~/projects/data/CLEAN_House5.csv")
mp <- tsmp::tsmp(dd$Appliance1[1:1000000], window_size = 1000, exclusion_zone = 1 / 4, verbose = 3, n_workers = 2)
mp <- tsmp::find_motif(mp)
mp

runs for few hours then gets killed by OS:

$ Rscript try1.R
Warming up parallel with 2 cores.
STOMP [==>----------------------------]  11% at 9.6 it/s, elapsed:  3h, eta:  1dKilled

Environment

R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
> Sys.getenv()
CLUTTER_BACKEND         x11
CLUTTER_IM_MODULE       
COLORTERM               gnome-terminal
COLUMNS                 80
DBUS_SESSION_BUS_ADDRESS
                        unix:abstract=/tmp/dbus-SwPNfZI5gn
DEFAULTS_PATH           /usr/share/gconf/xubuntu.default.path
DESKTOP_SESSION         xubuntu
DISPLAY                 :0.0
EDITOR                  vi
GDM_LANG                en_US
GDMSESSION              xubuntu
GLADE_CATALOG_PATH      :
GLADE_MODULE_PATH       :
GLADE_PIXMAP_PATH       :
GNOME_KEYRING_CONTROL   
GNOME_KEYRING_PID       
GPG_AGENT_INFO          /home/psenin/.gnupg/S.gpg-agent:0:1
GTK_IM_MODULE           
GTK_OVERLAY_SCROLLING   0
HOME                    /home/psenin
IBUS_DISABLE_SNOOPER    1
IM_CONFIG_PHASE         1
INSTANCE                
JOB                     dbus
LANG                    en_US.UTF-8
LANGUAGE                en_US
LD_LIBRARY_PATH         /opt/R/3.5.2/lib/R/lib:/usr/local/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server
LESSCLOSE               /usr/bin/lesspipe %s %s
LESSOPEN                | /usr/bin/lesspipe %s
LINES                   24
LN_S                    ln -s
LOGNAME                 psenin
LS_COLORS               rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
MAKE                    make
MANDATORY_PATH          /usr/share/gconf/xubuntu.mandatory.path
ORBIT_SOCKETDIR         /tmp/orbit-psenin
PAGER                   /usr/bin/less
PATH                    /home/psenin/bin:/home/psenin/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
PWD                     /home/psenin
QT_ACCESSIBILITY        1
QT_IM_MODULE            
QT_LINUX_ACCESSIBILITY_ALWAYS_ON
                        1
QT_STYLE_OVERRIDE       gtk
QT4_IM_MODULE           
R_ARCH                  
R_BROWSER               /usr/bin/firefox
R_BZIPCMD               /bin/bzip2
R_DOC_DIR               /opt/R/3.5.2/lib/R/doc
R_GZIPCMD               /bin/gzip
R_HOME                  /opt/R/3.5.2/lib/R
R_INCLUDE_DIR           /opt/R/3.5.2/lib/R/include
R_LIBS_SITE             
R_LIBS_USER             ~/R/x86_64-pc-linux-gnu-library/3.5
R_PAPERSIZE             letter
R_PDFVIEWER             /usr/bin/xdg-open
R_PLATFORM              x86_64-pc-linux-gnu
R_PRINTCMD              lpr
R_RD4PDF                times,inconsolata,hyper
R_SESSION_TMPDIR        /tmp/RtmpK2METe
R_SHARE_DIR             /opt/R/3.5.2/lib/R/share
R_SYSTEM_ABI            linux,gcc,gxx,gfortran,?
R_TEXI2DVICMD           texi2dvi
R_UNZIPCMD              /usr/bin/unzip
R_ZIPCMD                /usr/bin/zip
SED                     /bin/sed
SESSION                 xubuntu
SESSION_MANAGER         local/psenin-ts:@/tmp/.ICE-unix/8049,unix/psenin-ts:/tmp/.ICE-unix/8049
SESSIONTYPE             
SHELL                   /bin/bash
SHLVL                   1
SSH_AUTH_SOCK           /run/user/1000/keyring/ssh
TAR                     /bin/tar
TERM                    xterm
TERMINATOR_UUID         urn:uuid:83cc2f0f-1cac-4320-865a-1ee406128b10
UPSTART_EVENTS          started xsession
UPSTART_INSTANCE        
UPSTART_JOB             startxfce4
UPSTART_SESSION         unix:abstract=/com/ubuntu/upstart-session/1000/7731
USER                    psenin
WINDOWID                6291460
XAUTHORITY              /home/psenin/.Xauthority
XDG_CONFIG_DIRS         /etc/xdg/xdg-xubuntu:/usr/share/upstart/xdg:/etc/xdg:/etc/xdg
XDG_CURRENT_DESKTOP     XFCE
XDG_DATA_DIRS           /usr/share/xubuntu:/usr/share/xfce4:/usr/local/share:/usr/share:/var/lib/snapd/desktop:/usr/share
XDG_GREETER_DATA_DIR    /var/lib/lightdm-data/psenin
XDG_MENU_PREFIX         xfce-
XDG_RUNTIME_DIR         /run/user/1000
XDG_SEAT                seat0
XDG_SEAT_PATH           /org/freedesktop/DisplayManager/Seat0
XDG_SESSION_DESKTOP     xubuntu
XDG_SESSION_ID          c2
XDG_SESSION_PATH        /org/freedesktop/DisplayManager/Session0
XDG_SESSION_TYPE        x11
XDG_VTNR                7
XMODIFIERS
franzbischoff commented 5 years ago

I'm in 6% testing.

Can you try using garbage collector inside STOMP code?

franzbischoff commented 5 years ago

I found a quick solution. I'm working on tweaking this for next release:

open stomp-par.R and change (line 116 approx):

per_work <- max(10, min(250, ceiling(num_queries / 100)))

with:

  min_per_work <- 200
  max_per_work <- 10000
  plateaux_n_works <- 400
  per_work <- max(min_per_work, min(max_per_work, ceiling(num_queries / plateaux_n_works)))

And

at line 144 approx, add next to:

pb$tick(per_work)

the function gc():

pb$tick(per_work)
gc()

image

seninp commented 5 years ago

Thank you for the quick response! I will try tomorrow -- once back to the office and will update. Thanks!

On Tue, Mar 5, 2019 at 2:40 PM Francisco Bischoff notifications@github.com wrote:

I found a quick solution. I'm working on tweaking this for next release:

open stomp-par.R and change (line 116 approx):

per_work <- max(10, min(250, ceiling(num_queries / 100)))

with:

min_per_work <- 200 max_per_work <- 10000 plateaux_n_works <- 400 per_work <- max(min_per_work, min(max_per_work, ceiling(num_queries / plateaux_n_works)))

And

at line 144 approx, add next to:

pb$tick(per_work)

the function gc():

pb$tick(per_work) gc()

[image: image] https://user-images.githubusercontent.com/984592/53842373-2e3ef400-3f97-11e9-8111-7d7e557fa49e.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/franzbischoff/tsmp/issues/38#issuecomment-469887656, or mute the thread https://github.com/notifications/unsubscribe-auth/ABplukuv7uBXquhiWokltCNMccR0WHGuks5vTvJbgaJpZM4bdiih .

-- Mahalo, Pavel.

franzbischoff commented 5 years ago

The first trial (original code), after 11% it blew up my 32GB memory. After the changes, I'm at 53%, with the memory usage of 5-7GB, oscillating.

franzbischoff commented 5 years ago
> res <- tsmp(CLEAN_House5$Appliance1[1:1000000], window_size = 1000, exclusion_zone = 1 / 4, verbose = 3, n_workers = 4)
Warming up parallel with 4 cores.
STOMP [================================] 100% at 22 it/s, elapsed: 13h, eta:  0s
Finished in 12.84 hours

image

franzbischoff commented 5 years ago

The result, to save your time :-) data.zip