Closed sheridancbio closed 4 years ago
Currently, there is a processing running MafAnnotator.py which refers to -p /data/portal-cron/cbio-portal-data/dmp/msk_solid_heme/oncokb/data_mutations_extended.oncokb.previous.txt ... so it seems to be reading the previous oncokb annotated previous file. Perhaps I should not delete that file, but I wonder if this process will complete or not. It has been running for 2 hours now.
pipelines:/data2/portal-cron/cbio-portal-data/dmp/dmp-2020/msk_solid_heme/oncokb> ls -l -rw-r--r-- 1 cbioportal_importer schultz 155571853206 Mar 6 12:30 data_mutations_extended.oncokb.previous.txt -rw-r--r-- 1 cbioportal_importer schultz 107283827000 Mar 6 14:42 data_mutations_extended.oncokb.txt -rw-r--r-- 1 cbioportal_importer schultz 154998663548 Mar 5 16:24 data_mutations_extended_somatic.oncokb.txt
When complete, it looks like this process (with 3 files) has a 145GB * 3 = 435GB footprint on the disk. This is probably too much for our production area where we load data from for the nightly imports of impact.
Consider not using precache since it takes a lot of additional space and we need to do a fresh annotation when OncoKB gets updated.
Precache is no longer in use. Disk footprint is under 1 G.
Before re-enabling precache we should check whether the same failures from the past are still present.
Closing this card as complete.
The data disk on pipelines was nearly full today. One factor clogging the disk is 360GB of data sitting in the oncokb-annotator output directory /data2/portal-cron/cbio-portal-data/dmp/dmp-2020/msk_solid_heme/oncokb
On top of that, the rsync processes attempting to rsync this data on to ramen to make it available for download by project managers stalled yesterday 2020_03_05 and also the day before 2020_03_04 ... both stalled processes currently active.
Something seems seriously wrong here, and it is jeopardizing the normal processing if it fills the disk.