hydroshare / hydroshare-jupyterhub

The HydroShare Jupyterhub Notebook Server is an environment designed provide added value to existing HydroShare resources via interactive computational notebooks.
4 stars 6 forks source link

Received error BadZipFile('File is not a zip file',) on both NCSA and USU-Beta #37

Closed ChristinaB closed 7 years ago

ChristinaB commented 7 years ago

This error occurs on the Welcome Page. For example, go to this resource Watershed Dynamics Model: Climate

Click on Open With (select either server)

  1. Establish a Secure Connection operates as expected
  2. Query HydroShare Resource Content gives the error

Downloading ..Received error BadZipFile('File is not a zip file',) when unzipping BagIt archive to /home/jovyan/work/notebooks/data.

AND it says
Download Completed Successfully

Which is not the case.

Help! Development on our workflows is halted until this bug is fixed. Does this have to do with [#1868] (https://github.com/hydroshare/hydroshare/pull/1868) ????

ChristinaB commented 7 years ago

We think that this is because the ipynb is 1.5 MB. Also, the Download All Content as Zipped Bagit Archive is brokecn on the HydroShare resource, if large. What is large? Is this the reason?

ChristinaB commented 7 years ago

@pkdash @aphelionz This was all working on Friday Feb 17. So something that was done since then may have broken our current use case. Thanks!

ChristinaB commented 7 years ago

Also, the 4b7f19774f5442e8b17c38d2f72df49b files (in this example) in the servers have disappeared because of some piece of the operation that is executed from the content = hs.getResourceFromHydroShare(os.environ['HS_RES_ID'])

ChristinaB commented 7 years ago

Hey @aphelionz @Castronova @pkdash This particular issue is not occurring when we open and download data from an example Notebook that Tony has linked to the Welcome Page.
We can also 'Open With' this resource Landlside Example from the resource page.

Why can't we download and run data from this Resource? Watershed Dynamics Model: Climate

pkdash commented 7 years ago

@hyi Any idea what might be causing the resource bag creation never ending for this resource Watershed Dynamics Model: Climate

hyi commented 7 years ago

@ChristinaB @pkdash @mjstealey @rayi113 I have done some investigation on this and here is what I found out:

ChristinaB commented 7 years ago

Hong - thanks so much for looking into this.
Today I made a new resource, uploaded all the same files, and then ran the Welcome page code.
Here is what happened the three times I repeated the same code block: content = hs.getResourceFromHydroShare(os.environ['HS_RES_ID'])

  1. Execute content = hs.getResourceFromHydroShare(os.environ['HS_RES_ID']) Downloading ..Received error BadZipFile('File is not a zip file',) when unzipping BagIt archive to /home/jovyan/work/notebooks/data.
  2. Repeat Execute of content = hs.getResourceFromHydroShare(os.environ['HS_RES_ID']) I got the same error.
  3. Repeat content = hs.getResourceFromHydroShare(os.environ['HS_RES_ID']) Downloaded correctly!**

I was able to open the Notebook and execute the code.

On the resource
Watershed Dynamics Model: Climate I did try repeating this code execution until it worked, because I have had this issue before, but it is not an ideal solution (especially for people who are logical enough to believe the error message)

This is the new resource where the launch worked (eventually) Watershed Dynamics Model: Climate Data Pypeline

hyi commented 7 years ago

@ChristinaB @pkdash This looks like an async-bag-download issue. From hydroshare website, when you download a bag, you will receive a notification message on the top informing users to wait until the bag is created which will take some time for a resource with large files. On the REST API level, I thought the similar task status polling logic is also implemented, but your code for calling REST API to download resource bag may not be sufficient to have this task polling logic incorporated. @pkdash Can you advise here to see if @ChristinaB 's code for calling REST API to download a bag asynchronously is correct or not?

pkdash commented 7 years ago

@ChristinaB What version of hs_restclient are you using?

ChristinaB commented 7 years ago

I don't know. How can I find out? When you run the sample resource, do you get the same error?? Thanks!!

pkdash commented 7 years ago

To find the version, you can do this: >>> hs_restclient.__version__

I think @Castronova has updated to use the latest version of hs_restclient recently. Your earlier code may work now.

aphelionz commented 7 years ago

@ChristinaB @pkdash working on verifying that now. I need a login though... @Castronova

ChristinaB commented 7 years ago

Update: This issue is still occurring as of March 8 testing. Is it helpful to provide computer and network information from the systems we are using? I don't remember this being an issue this @jphuong do you know when we started noticing this? Three weeks ago? Thanks!! C

ChristinaB commented 7 years ago

Update: This is still an issue. I have tested it with big resources and small resources.

Most recently I tried it with a Model Instance resource type instead of Generic. The current error I get from the NSCA server is still: Downloading ..Received error BadZipFile('File is not a zip file',) when unzipping BagIt archive to /home/jovyan/work/notebooks/data.

Test it out on this public resource: https://www.hydroshare.org/resource/b363b853561147ffa0d7c7372dd18191/

I can get the data to download if I execute multiple times.

Do we want to demo this next week and have new users work with it and have this be the first thing they see??

Imagine this dialogue: Welcome to HydroShare JupyterHub Go to the Resource, Open with NSCA server. Click through the welcome page. If you get the error - Downloading ..Received error BadZipFile('File is not a zip file',) when unzipping BagIt archive to /home/jovyan/work/notebooks/data. - just ignore it and keep repeating the execution of that cell. We don't know what it means. We don't know why it happens. We just keep executing the cell until it works.

I need help by fixing this, changing the message for this error, or some other work around because I am not comfortable presenting the Welcome Page steps as they are currently and frequently give this error.

March 22-23 is the workshop. We are trying to test the demos and getting more and more frustrated and uncertain about this issue.

ChristinaB commented 7 years ago

When I test the same Welcome Page download from USU-beta server, I get this error message. If I retry the cell multiple times, I also can eventually get it to work.
image

ChristinaB commented 7 years ago

Thanks for your help! Let me know if there is some other test or angle I can try in order to avoid this issue for new users.

jphuong commented 7 years ago

Mark - I was wondering if you have any idea about this issue? We anecdotally think that all the recent updates (since last month) are the cause of this problem. I am looking for a solution to demo HydroShare JupyterHub next week, but we are currently stalled on development of the demos because we can not use the function that starts every work session:

get resource content. Returns a dictionary of filenames and their paths

content = hs.getResourceFromHydroShare(os.environ['HS_RES_ID'])

Thanks for any help, insight, or work around solutions you might suggest.
Christina

pkdash commented 7 years ago

@jphuong

content = hs.getResourceFromHydroShare(os.environ['HS_RES_ID'])

getResourceFromHydroShare() is not a function/method in hs_restclient. The function that retrieves a resource is getResource().

jphuong commented 7 years ago

@pkdash Please inspect the hydroshare.py file within /hydroshare-jupyterhub/notebooks/utilities/ folder and the Welcome.ipynb file in the /notebooks folder.

The getResourceFromHydroShare() function is defined within the Hydroshare() class, imported from hydroshare.py file in the utilities folder. When I tried import hs_restclient on the NCSA and USU-beta servers, they responded that getResource() is not a recognized name:

image

@pkdash @aphelionz @Castronova Where does the most up-to-date list of operational hs_restclient functions/methods exist? The functions and documentations described at 'http://hs-restclient.readthedocs.io/en/latest/' is way outdated. More importantly, how can that be imported into a Jupyterhub ipython notebook session?

pkdash commented 7 years ago

@jphuong It seems the getResourceFromHydroShare() is internally calling the getResource() function of the hs_restclient. What version of hs_restclient are you using?

The documentations described at 'http://hs-restclient.readthedocs.io/en/latest/' mostly should be up-to-date. It is probably missing documentation for couple of functions. We will fix that in the next release.

Castronova commented 7 years ago

@jphuong This should be addressed in the latest code running on jupyter.usu.edu. It has not been released on NCSA yet.

Can you please check to see if the problem has been fixes on jupyter.usu.edu?

ChristinaB commented 7 years ago

@Castronova
I tried the USU server to open this resource: https://www.hydroshare.org/resource/0f4efd1cedb64a5a9fa90cf1f248e22f/ and got this error.

image

ChristinaB commented 7 years ago

That resource opened with NCSA gives the (currently) expected gives the same BadZipFile error. To run it, I have uploaded Jim's work around relabeled Welcome_Landlab.ipynb so that we can launch it this week. Is that the plan I should continue with? Or do we think Welcome may be operational by tomorrow? Thanks!

ChristinaB commented 7 years ago

This resource Welcome page works for both USU and NCSA. https://www.hydroshare.org/resource/884205b54f5c4d9e9c073150db2d649e/ I don't know what is wrong with the individual resources that cause this variability. Or does it vary in time based on server load?

Castronova commented 7 years ago

I'm getting this error intermittently, i.e. doesn't work with

content = hs.getResourceFromHydroShare('4b7f19774f5442e8b17c38d2f72df49b')

but works with

content = hs.getResourceFromHydroShare('884205b54f5c4d9e9c073150db2d649e')
pkdash commented 7 years ago

@Castronova can you try the following again (this is the one that didn't work):

content = hs.getResourceFromHydroShare('4b7f19774f5442e8b17c38d2f72df49b')

pkdash commented 7 years ago

@Castronova Also are you using version 1.2.6 of hs_restclient?

ChristinaB commented 7 years ago

We are now getting a new error related to this issue:

Code from Welcome Page content = hs.getResourceFromHydroShare(os.environ['HS_RES_ID'])

Gives this error: image

ChristinaB commented 7 years ago

It seems like some kind of corruption occurs with resources with either size, multiple openings and closings, or some actual file corruption.

When I make a new generic resource and upload a Notebook, I can execute the

content = hs.getResourceFromHydroShare(os.environ['HS_RES_ID'])

multiple times and get the Downloading ..Received error BadZipFile('File is not a zip file',) when unzipping BagIt archive to /home/jovyan/work/notebooks/data.

But eventually it downloads. (after 10-20 attempts, yes, I feel like I'm going crazy).

pkdash commented 7 years ago

I am still waiting to know what version of hs_restclient is being used.

Castronova commented 7 years ago

@ChristinaB @pkdash The JupyterHub implementation that's hosted at NCSA is using an out-of-date hs_restclient which is probably causing the issue above. We are in the process of deploying a fix, but are currently having trouble with the beta testing server. I will roll out the fix as soon as possible.

ChristinaB commented 7 years ago

@ddcamiu - do you mind updating the out-of-date hs_restclient ? This issue is six weeks old, and we have not been able to consistently use the Welcome page for two months. I would like to close this issue as soon as possible. It will be resolved when users can Open With/NCSA server/execute the Welcome Page, and download the resource files from HydroShare to the JuptyerHub server with no errors.

If we CAN NOT resolve this in the next few days I need to cancel the use of HydroShare in CSDMS workshops in May. Please let me know if I should do this so that I can show professional courtesy to our new HydroShare users. I will check with users on how many days they have left to work with us/wait for this fix.

ChristinaB commented 7 years ago

@ddcamiu The latest REST client on Pypy is form January. @aphelionz has updated the REST API services, but is working on the REST client and promised it by the end of this week. If the three of us could coordinate on the updating next week, we may be able to close this issue? Thanks!

Castronova commented 7 years ago

@ChristinaB I have the hs_restclient updated in the development version of the code, but our testing server is currently being rebuilt. We will need to first deploy the fix to the development server before we can roll it out to NCSA. Unfortunately this will not include @aphelionz changes yet since this would require further updates to the jupyterhub notebooks and would delay release.

I understand your frustration, but this has not been an easy bug to fix. I am working to resolve this in the next few days.

Castronova commented 7 years ago

In the meantime, it would be nice if someone could test that the master branch of the hs_restclient fixes this issue since that is what will be installed on the jupyterhub server.

ChristinaB commented 7 years ago

Who is the best someone to test hs_restclient?

I sounds like this is a high risk fix - meaning that we really don't know how many days it will take. That's fine, I understand.

Is it possible to set up the server to launch a Welcome_Public page that works for any public resource using the URL instead of the hs_restclient? Until this is stable? @jphuong and I developed it for the workshop two weeks ago.

ChristinaB commented 7 years ago

Here is a link to my proposed temporary work around Welcome Public

Castronova commented 7 years ago

@ChristinaB unless you are installing a newer version of hs_restclient somewhere, this notebook will still use the older version of hs_restclient that was installed on the server.

Does this resolve your error?

Castronova commented 7 years ago

Never mind...it looks like your using wget instead of hs_restclient

ChristinaB commented 7 years ago

Yes - using wget avoids the hs_restclient issues related to downloading. Can some version of this approach be the backup Welcome.ipynb when hs_restclient throws errors?

Castronova commented 7 years ago

I will try to implement this method as a backup so that if hs_restclient fails we can use wget

ChristinaB commented 7 years ago

Thank you!! I'll prioritize testing it whenever you are ready.