knowledgesystems / pipelines-scrum

Repository for tracking uncategorizable issues related to backend pipelines work
0 stars 0 forks source link

oncokb-annotator is no longer clogging the disk #97

Closed sheridancbio closed 4 years ago

sheridancbio commented 4 years ago

The data disk on pipelines was nearly full today. One factor clogging the disk is 360GB of data sitting in the oncokb-annotator output directory /data2/portal-cron/cbio-portal-data/dmp/dmp-2020/msk_solid_heme/oncokb

On top of that, the rsync processes attempting to rsync this data on to ramen to make it available for download by project managers stalled yesterday 2020_03_05 and also the day before 2020_03_04 ... both stalled processes currently active.

Something seems seriously wrong here, and it is jeopardizing the normal processing if it fills the disk.

sheridancbio commented 4 years ago

Currently, there is a processing running MafAnnotator.py which refers to -p /data/portal-cron/cbio-portal-data/dmp/msk_solid_heme/oncokb/data_mutations_extended.oncokb.previous.txt ... so it seems to be reading the previous oncokb annotated previous file. Perhaps I should not delete that file, but I wonder if this process will complete or not. It has been running for 2 hours now.

sheridancbio commented 4 years ago

pipelines:/data2/portal-cron/cbio-portal-data/dmp/dmp-2020/msk_solid_heme/oncokb> ls -l -rw-r--r-- 1 cbioportal_importer schultz 155571853206 Mar 6 12:30 data_mutations_extended.oncokb.previous.txt -rw-r--r-- 1 cbioportal_importer schultz 107283827000 Mar 6 14:42 data_mutations_extended.oncokb.txt -rw-r--r-- 1 cbioportal_importer schultz 154998663548 Mar 5 16:24 data_mutations_extended_somatic.oncokb.txt

When complete, it looks like this process (with 3 files) has a 145GB * 3 = 435GB footprint on the disk. This is probably too much for our production area where we load data from for the nightly imports of impact.

n1zea144 commented 4 years ago

Consider not using precache since it takes a lot of additional space and we need to do a fresh annotation when OncoKB gets updated.

sheridancbio commented 4 years ago

Precache is no longer in use. Disk footprint is under 1 G.

Before re-enabling precache we should check whether the same failures from the past are still present.

Closing this card as complete.