matthiasdv commented 5 years ago

Behaviour

As context, we're storing and reading papermill log data on Azure blob storage. Take the following snippet as an example;

import papermill as pm
import scrapbook as sb
import azure.common as ac
import json

papermill_log_output_query_string = "?sig=some-unique-secret-token"

nb_path = "abs://mystorage.blob.core.windows.net/my-actual-notebook.ipynb"

try:
    nb = sb.read_notebook(nb_path + papermill_log_output_query_string)
except ac.AzureMissingResourceHttpError as err:
    logging.info(err)

This raises the following error;

ValueError: Requires an '.ipynb' file extension. Provided path:...

Expected behaviour

Azure allows users to provide GET parameters along with the file URL, to enable authorization. In this case an access token.
The file should pass, and not raise an extention error

Codecov Report

Merging #49 into master will decrease coverage by 0.4%. The diff coverage is 71.42%.

@@            Coverage Diff             @@
##           master      #49      +/-   ##
==========================================
- Coverage   96.14%   95.73%   -0.41%     
==========================================
  Files          10       10              
  Lines         441      446       +5     
==========================================
+ Hits          424      427       +3     
- Misses         17       19       +2

matthiasdv commented 5 years ago

Update: I've changed the ext check to contains instead of endswith() since raising a warning still breaks the flow of the program.

MSeal commented 5 years ago

I think raising a warning instead of an error would be reasonable here as well. I agree that doing contains is better, though we could be a little smarter and do splitext(urlparse(url).path)[-1].endswith('ipynb') for the logic to select the warning. If you wanted to change the code to emit a warning here I'll gladly merge it.

MSeal commented 5 years ago

If you want just this change for now that's fine too. Though a small test to cover the case would be nice either way :)

matthiasdv commented 5 years ago

I've implemented the suggested url ext check. Added a few simple tests too. One thing i'm unsure about here is that Papermill requires the Azure dependency or will throw a missing dependency exception.

In case the Notebook() call checks the path before actually trying to fetch the object, I believe we can safely ignore the dependency. Which I believe is the case.

If I'm wrong about this and we actually need to connect to Aure or AWS before checking the path more work would be needed.

matthiasdv commented 5 years ago

Hmm. Python 2.7 has a module named urlparse, and this is renamed to urllib.parse in python 3.X. This is what's causing the build to fail.

matthiasdv commented 5 years ago

fixed this with a conditional import, but unsure if that's the most idiomatic way

nteract / scrapbook

Error is thrown when using GET parameters on cloud storage paths #49

Behaviour

Expected behaviour

Suggested solution

Codecov Report