Closed ofrei closed 3 years ago
Issued resolved and logged the incident in https://docs.google.com/forms/d/e/1FAIpQLSfyQtSd3intuKkb5O4hmmPq5UzX6EhuCk95ovNfHULc7DIBKg/viewform
@Sandeek It seem we have the same issue on p697-appn-norment01. Could you double-check and fix as before? If it happens again it is possible to investigate further? I think Sabry had some insights, either this was related to lack of space on /tmp folder or some other things...
I have the same issue. on p697-appn-norment01
@idaElken as a workaround you could use p697-submit or p697-submit2 machines - they work fine for me as of now
@ofrei . Thanks - I'll try that!
?Hi all,
I will check with Bart to find out the reason.
Best
Sandeep Karthikeyan Data Engineer CoE NORMENT, K.G. Jebsen Centre for Psychosis Research Institute of Clinical Medicine, University of Oslo Division of Mental Health and Addiction, Oslo University Hospital www.med.uio.no/norment/english/http://www.med.uio.no/norment/english/%20 Office: Ullevål Hospital, Building 48 Tel: +47 41390032
From: idaElken notifications@github.com Sent: 02 February 2021 15:53 To: norment/tsd_issues Cc: Sandeep Karthikeyan; Mention Subject: Re: [norment/tsd_issues] /cluster/projects/p697 mount hangs on p697-appn-norment01 (#64)
@ofreihttps://github.com/ofrei . Thanks - I'll try that!
- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/norment/tsd_issues/issues/64#issuecomment-771690323, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARAVAASVHMKJHBP7MSE3OVTS5AGXTANCNFSM4W4MPM4A.
?Hi all,
The cluster is remounted and available again, unfortunately not able to figure out the reason for the issue :(
Best
Sandeep Karthikeyan Data Engineer CoE NORMENT, K.G. Jebsen Centre for Psychosis Research Institute of Clinical Medicine, University of Oslo Division of Mental Health and Addiction, Oslo University Hospital www.med.uio.no/norment/english/http://www.med.uio.no/norment/english/%20 Office: Ullevål Hospital, Building 48 Tel: +47 41390032
From: idaElken notifications@github.com Sent: 02 February 2021 15:53 To: norment/tsd_issues Cc: Sandeep Karthikeyan; Mention Subject: Re: [norment/tsd_issues] /cluster/projects/p697 mount hangs on p697-appn-norment01 (#64)
@ofreihttps://github.com/ofrei . Thanks - I'll try that!
- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/norment/tsd_issues/issues/64#issuecomment-771690323, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARAVAASVHMKJHBP7MSE3OVTS5AGXTANCNFSM4W4MPM4A.
This issue is happening for me again.
?This is really strange, working with TSD on this.
Best
Sandeep Karthikeyan Data Engineer CoE NORMENT, K.G. Jebsen Centre for Psychosis Research Institute of Clinical Medicine, University of Oslo Division of Mental Health and Addiction, Oslo University Hospital www.med.uio.no/norment/english/http://www.med.uio.no/norment/english/%20 Office: Ullevål Hospital, Building 48 Tel: +47 41390032
From: E-Claire notifications@github.com Sent: 08 February 2021 11:19 To: norment/tsd_issues Cc: Sandeep Karthikeyan; Mention Subject: Re: [norment/tsd_issues] /cluster/projects/p697 mount hangs on p697-appn-norment01 (#64)
This issue is happening for me again.
- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/norment/tsd_issues/issues/64#issuecomment-775035678, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARAVAAWUOAGNTNSJT473MLTS563BNANCNFSM4W4MPM4A.
Dear Claire,
Can you specify your p697 username? and from when you are facing this issue?
My username is p697-elizabethc. I had been working on the appn node and then it suddenly just stoped working. So I tried re-connecting and could access tsd but not the cluster - which is when I replied here.
I just tried logging into appn and accessing the cluster now (from the appn node) and I am able to - so super weird that the issue seems really intermittent
?Hi Claire,
There was a three minute outage on p697 cluster mount - hence the problem. Right now, it is not hanging.
Best
Sandeep Karthikeyan Data Engineer CoE NORMENT, K.G. Jebsen Centre for Psychosis Research Institute of Clinical Medicine, University of Oslo Division of Mental Health and Addiction, Oslo University Hospital www.med.uio.no/norment/english/http://www.med.uio.no/norment/english/%20 Office: Ullevål Hospital, Building 48 Tel: +47 41390032
From: E-Claire notifications@github.com Sent: 08 February 2021 12:28 To: norment/tsd_issues Cc: Sandeep Karthikeyan; Mention Subject: Re: [norment/tsd_issues] /cluster/projects/p697 mount hangs on p697-appn-norment01 (#64)
I just tried logging into appn and accessing the cluster now and I am able to - so super weird that the issue seems really intermittent
- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/norment/tsd_issues/issues/64#issuecomment-775078012, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARAVAAVVB3CHWGH4W2UML4DS57DGVANCNFSM4W4MPM4A.
Okay, thanks for looking into this Sundeep!!
In case this happens again - is there a certain amount of time that you would recommend waiting before reporting to see if the outage will just fix its self?
This issue has been persisting for some time now, TSD doesn't have much clue about why this occurs frequently, you can let me if and when it occurs.
Okay, thanks
Unsure whether this is related but now do not get past login for neither: p697-appn-norment01.tsd.usit.no; nor p697-submit.tsd.usit.no
After apparently logging in successfully, it hangs:
Any advice welcome :-)
Also hangs when trying to access p697-appn-norment01.tsd.usit.no from VMware and putty.
But apparently known issue (sorry for posting): https://www.uio.no/english/services/it/research/sensitive-data/log/nfs-hangs-on-submit-hosts.html
/Ida
?Hi Ida,
Can you try now?
Best
Sandeep Karthikeyan Data Engineer CoE NORMENT, K.G. Jebsen Centre for Psychosis Research Institute of Clinical Medicine, University of Oslo Division of Mental Health and Addiction, Oslo University Hospital www.med.uio.no/norment/english/http://www.med.uio.no/norment/english/%20 Office: Ullevål Hospital, Building 48 Tel: +47 41390032
From: Ida Sønderby notifications@github.com Sent: 09 February 2021 09:52 To: norment/tsd_issues Cc: Sandeep Karthikeyan; Mention Subject: Re: [norment/tsd_issues] /cluster/projects/p697 mount hangs on p697-appn-norment01 (#64)
Also hangs when trying to access p697-appn-norment01.tsd.usit.no from VMware and putty.
But apparently now issues (sorry for posting): https://www.uio.no/english/services/it/research/sensitive-data/log/nfs-hangs-on-submit-hosts.html
/Ida
- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/norment/tsd_issues/issues/64#issuecomment-775774547, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARAVAAUMMBADX4MJK7BPWHLS6DZTLANCNFSM4W4MPM4A.
This is resolved (works for me, also reported in operation log). @Sandeek please add to https://docs.google.com/forms/d/e/1FAIpQLSfyQtSd3intuKkb5O4hmmPq5UzX6EhuCk95ovNfHULc7DIBKg/viewform and close this ticket
I have this same problem again with the p697-appn hanging when I try and access the cluster. However, I am able to access the cluster through p697-submit.
Same for me - p697-appn got slower and slower throughout the morning, until it crashed.Now using p697-submit
It seem that /cluster/projects/p697 mount hangs on p697-appn-norment01. I can use p697-submit to access the data on cluster. I can ssh p697-appn-norment01, and I can access tsd/p697/data/durable/ However I can not access /cluster/projects/p697 from p697-appn-norment01.
https://rt.uio.no/SelfService/Display.html?id=4249055