uwcms / FinalStateAnalysis

An analysis framework for the Compact Muon Solenoid
3 stars 43 forks source link

I/O Errors on Patuples #49

Closed jjswan33 closed 11 years ago

jjswan33 commented 12 years ago

This is a cross posting. I don't think the issue is with the patuples them selves as they were fine when I last ran on them.

While these errors are not too frequent this causes problems when running on data as I need all of the jobs to complete.

All files are on Login05.

Error Type 1:

121019 01:19:31 001 Xrd: CheckErrorStatus: Server [s15n08.hep.wisc.edu:31094] declared: Unable to read /store/user/tapas/2012-09-18-8TeV-53X-PatTuple\ /data_TauPlusX_Run2012B_13Jul2012_v1/patTuple_cfg-025E2B54-5FD3-E111-ACA4-00266CF33100.root; Operation not permitted(error code: 3005) 19-Oct-2012 01:19:31 CDT Closed file root://cmsxrootd.hep.wisc.edu//store/user/tapas/2012-09-18-8TeV-53X-PatTuple/data_TauPlusX_Run2012B_13Jul2012_v\ 1/patTuple_cfg-025E2B54-5FD3-E111-ACA4-00266CF33100.root ----- Begin Fatal Exception 19-Oct-2012 01:19:32 CDT----------------------- An exception of category 'FileReadError' occurred while [0] Constructing the EventProcessor [1] Constructing input source of type PoolSource [2] Reading branch EventAuxiliary Exception Message: Unable to cache 134217728 byte file segment at 1207959552: got only 0 bytes back ----- End Fatal Exception -------------------------------------------------

/scratch/swanson/2012B-SUB/SUB-patTuple_cfg-025E2B54-5FD3-E111-ACA4-00266CF33100

Error Type 2:

121019 01:37:36 16838 Xrd: ClientSock::RecvRaw: Disconnection detected reading 8 bytes from socket 0 (server[xrootd.unl.edu:1094]). Revents=32 121019 01:37:36 16838 Xrd: XrdClientMessage::ReadRaw: Failed to read header (8 bytes). ----- Begin Fatal Exception 19-Oct-2012 01:37:36 CDT----------------------- An exception of category 'FallbackFileOpenError' occurred while [0] Constructing the EventProcessor [1] Constructing input source of type PoolSource [2] Calling RootInputFileSequence::initFile() Exception Message: Input file root://cmsxrootd.hep.wisc.edu//store/user/tapas/2012-09-18-8TeV-53X-PatTuple/data_TauPlusX_Run2012B_13Jul2012_v1/patTuple_cfg-6627E7B2-2ED\ 0-E111-A83F-0030487D5E67.root was not found, could not be opened, or is corrupted. Fallback Input file root://xrootd.unl.edu//store/user/tapas/2012-09-18-8TeV-53X-PatTuple/data_TauPlusX_Run2012B_13Jul2012_v1/patTuple_cfg-6627E7B2-2E\ D0-E111-A83F-0030487D5E67.root?source=glow also was not found, could not be opened, or is corrupted. ----- End Fatal Exception -------------------------------------------------

/scratch/swanson/2012B-SUB/SUB-patTuple_cfg-6627E7B2-2ED0-E111-A83F-0030487D5E67

ekfriis commented 12 years ago

Hi Josh,

It sort of seems like there is a line break in the file names (from your copy paste) below, is this possible?

Evan

On Fri, Oct 19, 2012 at 10:22 AM, jjswan33 notifications@github.com wrote:

This is a cross posting. I don't think the issue is with the patuples them selves as they were fine when I last ran on them.

While these errors are not too frequent this causes problems when running on data as I need all of the jobs to complete.

All files are on Login05.

Error Type 1:

121019 01:19:31 001 Xrd: CheckErrorStatus: Server [ s15n08.hep.wisc.edu:31094] declared: Unable to read /store/user/tapas/2012-09-18-8TeV-53X-PatTuple\ /data_TauPlusX_Run2012B_13Jul2012_v1/patTuple_cfg-025E2B54-5FD3-E111-ACA4-00266CF33100.root; Operation not permitted(error code: 3005) 19-Oct-2012 01:19:31 CDT Closed file root:// cmsxrootd.hep.wisc.edu//store/user/tapas/2012-09-18-8TeV-53X-PatTuple/data_TauPlusX_Run2012B_13Jul2012_vhttp://cmsxrootd.hep.wisc.edu//store/user/tapas/2012-09-18-8TeV-53X-PatTuple/data_TauPlusX_Run2012B_13Jul2012_v%5C 1/patTuple_cfg-025E2B54-5FD3-E111-ACA4-00266CF33100.root ----- Begin Fatal Exception 19-Oct-2012 01:19:32 CDT----------------------- An exception of category 'FileReadError' occurred while [0] Constructing the EventProcessor [1] Constructing input source of type PoolSource [2] Reading branch EventAuxiliary Exception Message: Unable to cache 134217728 byte file segment at 1207959552: got only 0 bytes back ----- End Fatal Exception -------------------------------------------------

/scratch/swanson/2012B-SUB/SUB-patTuple_cfg-025E2B54-5FD3-E111-ACA4-00266CF33100

Error Type 2:

121019 01:37:36 16838 Xrd: ClientSock::RecvRaw: Disconnection detected reading 8 bytes from socket 0 (server[xrootd.unl.edu:1094]). Revents=32 121019 01:37:36 16838 Xrd: XrdClientMessage::ReadRaw: Failed to read header (8 bytes). ----- Begin Fatal Exception 19-Oct-2012 01:37:36 CDT----------------------- An exception of category 'FallbackFileOpenError' occurred while [0] Constructing the EventProcessor [1] Constructing input source of type PoolSource [2] Calling RootInputFileSequence::initFile() Exception Message: Input file root:// cmsxrootd.hep.wisc.edu//store/user/tapas/2012-09-18-8TeV-53X-PatTuple/data_TauPlusX_Run2012B_13Jul2012_v1/patTuple_cfg-6627E7B2-2EDhttp://cmsxrootd.hep.wisc.edu//store/user/tapas/2012-09-18-8TeV-53X-PatTuple/data_TauPlusX_Run2012B_13Jul2012_v1/patTuple_cfg-6627E7B2-2ED%5C 0-E111-A83F-0030487D5E67.root was not found, could not be opened, or is corrupted. Fallback Input file root:// xrootd.unl.edu//store/user/tapas/2012-09-18-8TeV-53X-PatTuple/data_TauPlusX_Run2012B_13Jul2012_v1/patTuple_cfg-6627E7B2-2Ehttp://xrootd.unl.edu//store/user/tapas/2012-09-18-8TeV-53X-PatTuple/data_TauPlusX_Run2012B_13Jul2012_v1/patTuple_cfg-6627E7B2-2E%5C D0-E111-A83F-0030487D5E67.root?source=glow also was not found, could not be opened, or is corrupted. ----- End Fatal Exception -------------------------------------------------

/scratch/swanson/2012B-SUB/SUB-patTuple_cfg-6627E7B2-2ED0-E111-A83F-0030487D5E67

— Reply to this email directly or view it on GitHubhttps://github.com/uwcms/FinalStateAnalysis/issues/49.

jjswan33 commented 12 years ago

Just an effect from copy and pasting out of emacs from the .err file.

ekfriis commented 12 years ago

The first one [1] I can at least open interactively in root. The second one [2] seems genuinely corrupt.

Have you tried to rescue the failed jobs w/ DAG? Error type 1 might be fixed on resubmission. Not sure about error #2. Tapas, can you see if you can find [2] in the PAT tuplization log file?

[1] root:// cmsxrootd.hep.wisc.edu//store/user/tapas/2012-09-18-8TeV-53X-PatTuple/data_TauPlusX_Run2012B_13Jul2012_v1/patTuple_cfg-025E2B54-5FD3-E111-ACA4-00266CF33100.root

[2] root:// cmsxrootd.hep.wisc.edu//store/user/tapas/2012-09-18-8TeV-53X-PatTuple/data_TauPlusX_Run2012B_13Jul2012_v1/patTuple_cfg-6627E7B2-2ED0-E111-A83F-0030487D5E67.root

On Fri, Oct 19, 2012 at 10:53 AM, jjswan33 notifications@github.com wrote:

Just an effect from copy and pasting out of emacs from the .err file.

— Reply to this email directly or view it on GitHubhttps://github.com/uwcms/FinalStateAnalysis/issues/49#issuecomment-9593781.

tsarangi commented 12 years ago

cat /scratch/tapas/2012-09-18-8TeV-53X-PatTuple/data_TauPlusX_Run2012B_13Jul2012_v1/submit_1/patTuple_cfg-6627E7B2-2ED0-E111-A83F-0030487D5E67/patTuple_cfg-6627E7B2-2ED0-E111-A83F-0030487D5E67.out

This is the log file. This can be the artifact of the ski-existing-output option. This option doesn't do anything if there is an existing broken file in the output. It caused similar errors for Maria earlier.

What I can do, delete this output file and resubmit the failed jobs. First I need to figure out which local-repo I used. I changed to unique repo per request very recently, and for these above submissions, the local repo was being updated regularly.

-Tapas

On Fri, Oct 19, 2012 at 10:59 AM, Evan K. Friis notifications@github.comwrote:

The first one [1] I can at least open interactively in root. The second one [2] seems genuinely corrupt.

Have you tried to rescue the failed jobs w/ DAG? Error type 1 might be fixed on resubmission. Not sure about error #2. Tapas, can you see if you can find [2] in the PAT tuplization log file?

[1] root://

cmsxrootd.hep.wisc.edu//store/user/tapas/2012-09-18-8TeV-53X-PatTuple/data_TauPlusX_Run2012B_13Jul2012_v1/patTuple_cfg-025E2B54-5FD3-E111-ACA4-00266CF33100.root

[2] root://

cmsxrootd.hep.wisc.edu//store/user/tapas/2012-09-18-8TeV-53X-PatTuple/data_TauPlusX_Run2012B_13Jul2012_v1/patTuple_cfg-6627E7B2-2ED0-E111-A83F-0030487D5E67.root

On Fri, Oct 19, 2012 at 10:53 AM, jjswan33 notifications@github.com wrote:

Just an effect from copy and pasting out of emacs from the .err file.

— Reply to this email directly or view it on GitHub< https://github.com/uwcms/FinalStateAnalysis/issues/49#issuecomment-9593781>.

— Reply to this email directly or view it on GitHubhttps://github.com/uwcms/FinalStateAnalysis/issues/49#issuecomment-9593916.

ekfriis commented 12 years ago

For posterity

https://help.hep.wisc.edu/issue8682

ekfriis commented 11 years ago

Solution is to resubmit failed jobs. Closing this.