uec / Issue.Tracker

Automatically exported from code.google.com/p/usc-epigenome-center
0 stars 0 forks source link

hpcc job scheduler problem #778

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Actually, earlier issue may relate to a problem at hpcc.  Many analyses were 
not completed but are not running either, again possibly a function of the 
switch?

Original issue reported on code.google.com by cmnico...@gmail.com on 21 Jul 2014 at 4:46

GoogleCodeExporter commented 8 years ago
i see flowcell C4Y0JACXX has been submitted, and it looks ok to me. 

It currently has occupied all the 32 eightcore intel nodes, so some of the 
ready-to-run jobs are queued waiting for resources. Given the type of analysis 
it is, this should move fairly fast, the flowcell was submitted at around 
9:15am, I would expect things to start wrapping up in another 4 hours or so, 
depending on library.

Original comment by zack...@gmail.com on 21 Jul 2014 at 6:35

GoogleCodeExporter commented 8 years ago
Sorry should have been more specific.  It was the previous flow cell that
had problems--C4UDPACXX.  Some analyses finished, some did not.  But there
are no jobs for this flow cell in queue.  So they need to be resubmitted
presumably.  Also, emails got sent out saying analyses were complete and
they were not.  BUt again this probably relates to the transfer so maybe
nothing to worry about at the moment

On Mon, Jul 21, 2014 at 11:35 AM, <usc-epigenome-center@googlecode.com>
wrote:

Original comment by cmnico...@gmail.com on 21 Jul 2014 at 6:46

GoogleCodeExporter commented 8 years ago
The emails were sent because the files C4UDPACXX_qcmetrics.csv were detected in 
the samples directories.

Original comment by natalia....@gmail.com on 21 Jul 2014 at 6:55

GoogleCodeExporter commented 8 years ago
when I go through the logs I saw messages as follows
/var/spool/torque/mom_priv/jobs/9023541.hpc-pbs.hpcc.usc.edu.SC: line 19: 
/home/uec-00/shared/production/software/perl_utils_usc/wrap_tophat2.pl: No such 
file or directory
/var/spool/torque/mom_priv/jobs/9023545.hpc-pbs.hpcc.usc.edu.SC: line 18: 
/home/uec-00/shared/production/software/perl_utils_usc/wrap_picard.pl: No such 
file or directory

so it looks like the filesystem disappeared. Since the jobs submitted today 
seem to working fine, hopefully this means it was fixed and started working.

but, we can not trust anything from that flowcell, it will need to be 
resubmitted, we can delete  the results dir and any files from the previous 
attempt. let me know if you want me to clean it out 

Original comment by zack...@gmail.com on 21 Jul 2014 at 7:03

GoogleCodeExporter commented 8 years ago
I can clean and re-submit.  Thanks!

On Mon, Jul 21, 2014 at 12:03 PM, <usc-epigenome-center@googlecode.com>
wrote:

Original comment by cmnico...@gmail.com on 21 Jul 2014 at 7:38

GoogleCodeExporter commented 8 years ago

Original comment by zack...@gmail.com on 22 Jul 2014 at 10:37