load JSON $refs - Githubissues

nvillalva commented 9 years ago

Supports the loading of JSON $refs from URIs, absolute paths and relative paths.

This should address issue #4.

econchick commented 9 years ago

Hey @nvillalva ! Thanks so much for the PR!

I see that the Python 3 tests are failing. Try importing urlparse from six.

Also it looks like you may not have the most up-to-date master as there seems to be some clutter in your commit (mainly the utils.py stuff).

nvillalva commented 9 years ago

Hi @econchick!

I got the Python 3 tests fixed up. If you'd rather I import urlparse from six than the try/except that I did, I'm happy to switch.

The utils.py changes were intentional :) The idea was to reuse the "use requests if we can, otherwise use urllib" logic that's used for getting mime types from IANA for downloading referenced JSON schemas from URLs.

econchick commented 9 years ago

ahh I see now (haven't had my morning coffee yet!) - makes sense.

I would appreciate the use of six rather than the try/except, if you wouldn't mind terribly :)

I'll review this today/this weekend - but please bug me if you don't hear from me by Monday!

nvillalva commented 9 years ago

Definitely! I'll have a commit with six up shortly.

And...done! Thanks for being so speedy with the replies :)

econchick commented 9 years ago

Out of curiosity, was there a reason you chose to not use jsonschema package to do the heavy lifting of resolving $refs? Perhaps you wanted to avoid adding another dependency, but I only ask because I intend to incorporate it for #15 .

nvillalva commented 9 years ago

I moved away from it because hooking it into the loader was a) becoming more verbose than just implementing the ref following and b) I seemed to be hitting edge cases pretty frequently.

Also, for our internal use we really wanted the option to use relative or absolute paths and, as far as we could tell, jsonschema only supports dereferencing with URIs.

nvillalva commented 9 years ago

Also, for #15, the returned, dereferenced schema could still be passed through jsonschema's validate as it should still be a valid schema definition.

econchick commented 9 years ago

@nvillalva ah interesting; good to know re jsonschema.

(still reviewing - just taking lunch :) )

nvillalva commented 9 years ago

@econchick Any updates?

econchick commented 9 years ago

hey @nvillalva thanks for the ping; I was out of town all last week so I have some catching up to do but I'll finish this review this today hopefully!

econchick commented 9 years ago

Ok - thanks for your patience, @nvillalva !

Before I can merge this, I'd like the following:

[x] add docstrings for the semi-private functions you've added in loader.py
[x] the coverage on loader.py when from 100% to 90%. If you run py.test --cov ramlfications --cov-report term-missing (you may have to pip install pytest-cov if you get "unrecognized arguments") you can see the specific lines that are in need of test coverage.
[x] squash all commits into one atomic commit

An overview of the comments I made (mostly for me to keep track):

[ ] logging from utils.py should be broken out into a module-level function in which both download_url and update_mime_types can access
[ ] correct parsing of the $ref url
[ ] look into if using urllib to handle both file:/// and http(s):// is helpful
[ ] using isinstance instead of if type(foo)...
[ ] switch out the creation of example_with_absolute_ref.json to using pytest's tmpdir/tmpfile capabilities, and remove from .gitignore

nvillalva commented 9 years ago

Awesome! Thanks for the feedback, @econchick. I'll get on these updates as soon as I have a bit of free time.

econchick commented 9 years ago

No problem! Take your time.

benhamill commented 8 years ago

So. Let's talk about references in JSON Schema.

@jhl2343 and I did a bunch of reading about how those work. Looks like everybody's expectations about how they should work were wrong. Yay!

Some references we used to learn all this:

JSON Pointer https://tools.ietf.org/html/rfc6901
JSON Reference http://tools.ietf.org/html/draft-pbryan-zyp-json-ref-03
URI (scheme section) http://tools.ietf.org/html/rfc3986#section-3.1

So, one key thing from the JSON Reference spec is this:

Any members other than "$ref" in a JSON Reference object SHALL be ignored.

This means that our assumption that there was some ability to "merge" a $ref into some other JSON object was off. That $ref key should basically always appear alone in an object.

Another key thing is that URIs already provide a way to express relative links. In the case of the file scheme, you do it like this: file:some_dir/some_file.json. You only put the /// if you need the root directory. The first two /s signal the start of the "authority" section (which I'd usually just call the domain part). The end of the authority section is signaled by either /, ? or #. So the third / in file:/// is actually the / that means "the top of the file system". And if you don't have 3 /s, you're not ending the authority section. The ? and #, of course signal the start of the query params and the fragment, which necessarily come after the path, which is what we care about for finding files.

The third key thing is that when representing a JSON Reference in a URI, you're meant to always include the #. So we built that assumption in.

So. The tests all reflect that and we (at UA) will have to go rethink a lot of how we were trying to use $ref in our schemas.

I hope that makes sense and I didn't forget anything. My mind was kinda blown when we finally figured all this out.

benhamill commented 8 years ago

Another thing: @econchick: We're not sure why CI is failing about assert_called_once. Can you advise on that? We'll look at the rest.

econchick commented 8 years ago

@benhamill & James Lee (sorry - can't find the link to your profile?) - did you push to this PR? are you working with @nvillalva or just a friendly helper? just getting a bit confused (catching up since I went on vacation for the past 2 weeks - still reviewing everything)

econchick commented 8 years ago

@benhamill I'm a bit confused about what you said re: how $refs should work, particularly:

So, one key thing from the JSON Reference spec is this:

Any members other than "$ref" in a JSON Reference object SHALL be ignored.

This means that our assumption that there was some ability to "merge" a $ref into some other JSON object was off. That $ref key should basically always appear alone in an object.

Can you give me an example of how you originally assumed it worked, and how you interpret it now?

econchick commented 8 years ago

@benhamill so looking at this helpful reference for structuring schemas, are you staying that you thought you could do this:

{
  "$schema": "http://json-schema.org/draft-04/schema#",

  "definitions": {
    "address": {
      "type": "object",
      "properties": {
        "street_address": { "type": "string" },
        "city":           { "type": "string" },
        "state":          { "type": "string" }
      },
      "required": ["street_address", "city", "state"]
    }
  },

  "type": "object",

  "properties": {
    "billing_address": { "$ref": "#/definitions/address",
                                   "phone_number": "...." },
    "shipping_address": { "$ref": "#/definitions/address" }
  }
}

but see that "phone_number" under "billing_address" should be ignored?

econchick commented 8 years ago

FYI I am getting tests to pass when I just run python setup.py test -a "-v --cov ramlfications --cov-report term-missing" but I get those failures when I run tox -e py27 (both use 2.7.9) which is just the first command - so weird. I'm continuing to investigate.

econchick commented 8 years ago

@benhamill Ok so I'm not sure why but I figured out how to correct those failing tests with assert_called_once. Basically, for all of those, you'd switch out the line to be assert utils.foo_function.called as well as add a line assert utils.foo_function.call_count == 1.

Here's exactly what my "scratch paper" code looks like for those 4 failing tests (three in test_utils.py and one in test_mail.py). Disclaimer: for some reason my brain - while understands the concept of mocking just fine - can not get a firm grasp of actual implementation :) so the mock code may look juvenile :)

test_main.py

# <-- snip -->
# added a fixture
@pytest.fixture(scope="session")
def downloaded_xml():
    return os.path.join(UPDATE, "iana_mime_media_types.xml")

# passing in that fixture, IIRC mocker should be the last argument but I may be mixing that up
def test_update(runner, downloaded_xml, mocker):
    """
    Successfully update supported mime types
    """
    json_file = "ramlfications/data/supported_mime_types.json"
    parent = os.path.dirname(os.path.pardir)
    json_path = os.path.join(parent, json_file)

    start_mtime = os.path.getmtime(json_path)

    from ramlfications import utils
    # patch ALL the things
    mocker.patch("ramlfications.utils.download_url")
    mocker.patch("ramlfications.utils.xml_to_dict")
    mocker.patch("ramlfications.utils.parse_xml_data")
    mocker.patch("ramlfications.utils.requests_download")
    mocker.patch("ramlfications.utils.urllib_download")
    mocker.patch("ramlfications.utils.save_data")

    with open(downloaded_xml, "r", encoding="UTF-8") as raw_data:
        utils.download_url.return_value = raw_data.read()

        runner.invoke(main.update)

        assert utils.download_url.called
        assert utils.download_url.call_count == 1

        assert utils.xml_to_dict.called
        assert utils.xml_to_dict.call_count == 1

        assert utils.parse_xml_data.called
        assert utils.parse_xml_data.call_count == 1

        assert utils.save_data.called
        assert utils.save_data.call_count == 1

    end_mtime = os.path.getmtime(json_path)

    # sanity check that data was not written to file
    assert start_mtime == end_mtime

    mocker.stopall()

test_utils.py

# <-- snip -->
@patch("ramlfications.utils.parse_xml_data")
@patch("ramlfications.utils.xml_to_dict")
@patch("ramlfications.utils.save_data")
def test_insecure_download_urllib_flag(_a, _b, _c, mocker, monkeypatch):
    monkeypatch.setattr(utils, "SECURE_DOWNLOAD", False)
    monkeypatch.setattr(utils, "URLLIB", True)
    utils.requests = Mock()

    mocker.patch("ramlfications.utils.urllib_download")
    mocker.patch("ramlfications.utils.requests_download")

    utils.update_mime_types()
    assert utils.urllib_download.called
    assert utils.urllib_download.call_count == 1
    assert not utils.requests_download.called

    mocker.stopall()

@patch("ramlfications.utils.xml_to_dict")
@patch("ramlfications.utils.parse_xml_data")
@patch("ramlfications.utils.save_data")
def test_secure_download_requests_flag(_a, _b_, _c, mocker, monkeypatch):
    monkeypatch.setattr(utils, "SECURE_DOWNLOAD", True)
    monkeypatch.setattr(utils, "URLLIB", False)
    utils.urllib = Mock()

    mocker.patch("ramlfications.utils.requests_download")
    mocker.patch("ramlfications.utils.urllib_download")

    utils.update_mime_types()
    assert utils.requests_download.called
    assert utils.requests_download.call_count == 1
    assert not utils.urllib_download.called

    mocker.stopall()

@patch("ramlfications.utils.xml_to_dict")
@patch("ramlfications.utils.parse_xml_data")
@patch("ramlfications.utils.requests_download")
@patch("ramlfications.utils.urllib_download")
@patch("ramlfications.utils.save_data")
def test_update_mime_types(m_save, m_urllib, m_request, m_parse, m_xml,
                           downloaded_xml):

    with open(downloaded_xml, "r", encoding="UTF-8") as raw_data:
        utils.download_url.return_value = raw_data.read()

        utils.update_mime_types()
        assert m_urllib.called
        assert m_urllib.call_count == 1
        assert m_xml.called
        assert m_xml.call_count == 1
        assert m_parse.called
        assert m_parse.call_count == 1
        assert m_save.called
        assert m_save.call_count == 1

# <-- snip -->

econchick commented 8 years ago

@jhl2343 - it might be easier if you just run tox locally to see failing tests before you push your commits, rather than fixing individual things to see if they pass in travis.

benhamill commented 8 years ago

Yeah, so... when we pushed on July 10th, we had green tests locally and so were surprised when CI was red for everything. :shrug:

econchick commented 8 years ago

@benhamill I ran the last successful build #39 and it failed.

Looking into it, it looks like mock made an update that many people are having issues with, and that I didn't pin down the mock library version I needed, so travis/tox just automatically get's the latest one.

UGH!

I'm looking into patching it now, and hopefully push a change within a few hours. Really sorry for the troubles.

econchick commented 8 years ago

@benhamill @jhl2343 I pushed an update that fixed some of the tests for this PR (and restored all tests to passing for master), but there still seems to be some failing tests with the PR.

jinnthehuman commented 8 years ago

@econchick Fixed the issue with the pypy environment failing, and the checks have all passed. Could I get some feedback on it so far?

econchick commented 8 years ago

Sure - thanks for the ping. I'm at a conference right now so would have to wait til Monday - hope that's okay!

jinnthehuman commented 8 years ago

Monday sounds super. Thanks!

econchick commented 8 years ago

hey @jhl2343 - so a few things:

[ ] double check if # in a $ref value is actually required; I don't think it is
[ ] I'm pretty sure the requests package can not handle file:/// unlike urllib
[ ] the error messages for downloading URLs is still specifically for downloading IANA URLs, probably want to abstract that away a bit.

I also think the example JSON files for tests should actually be legitimate JSON Schemas, per the RAML spec. For example, rather than:

{
  "references": {
    "internal": "yes",
    "two": true
  },
  "name": "foo",
  "is_this_internal?": {"$ref": "#/references/internal"},
  "two_references": {"$ref": "#/references/two"}
}

it should be something like:

{
    "title": "Example Schema",
    "type": "object",
    "properties": {
        "firstName": {
            "type": "string"
        },
        "lastName": {
            "type": "string"
        },
        "age": {
            "description": "Age in years",
            "type": "integer",
            "minimum": 0
        }
    },
    "required": ["firstName", "lastName"]
}

Relatedly, when there is a test for an example (rather than a schema), it should follow the schema definition for that example, since in the future, there will be validation specifically for that.

For instance, say the RAML file has this:

schemas:
    - product: !include includes/product.schema.json
/product:
  displayName: Product
  post:
    description: Create a Product
      application/json:
        schema: product
        example: !include includes/product.example.json

so the JSON schema file includes/product.schema.json is like this:

{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "title": "Product",
    "description": "A product from Acme's catalog",
    "type": "object",
    "properties": {
        "id": {
            "description": "The unique identifier for a product",
            "type": "integer"
        },
        "name": {
            "description": "Name of the product",
            "type": "string"
        },
        "price": {
            "type": "number",
            "minimum": 0,
            "exclusiveMinimum": true
        },
        "tags": {
            "type": "array",
            "items": {
                "type": "string"
            },
            "minItems": 1,
            "uniqueItems": true
        }
    },
    "required": ["id", "name", "price"]
}

and the JSON example file includes/product.example.json is like this:

{
    "id": 1,
    "name": "A green door",
    "price": 12.50,
    "tags": ["home", "green"]
}

I'm not saying to include/implement JSON schema validation, but just to make sure the tests and associated JSON files are set up to be valid, if that makes sense.

Hope this helps - thanks so much!

jinnthehuman commented 8 years ago

@econchick Thanks for the feedback! This was exactly what I was looking for.

econchick commented 8 years ago

Hey @jhl2343 (and @nvillalva @benhamill anyone else) - any update on this PR? Please ping me here if you need a review.

jinnthehuman commented 8 years ago

@econchick I am currently assigned a task of higher priority, so it would be a bit until I can start to address the feedback. Thank you for all the feedback and we will ping you once we have more time to address the feedback. Sorry for not notifying you earlier.

econchick commented 8 years ago

@jhl2343 - totally understand!

I'm wondering - since you, @nvillalva and @benhamill have done a lot of the legwork, if you all would mind me taking over this patch and fixing the suggestions I gave earlier. I would clean up the commit history a tiny bit but I would not remove authorship (e.g. your names still in the commit history).

It is a great/highly-requested feature, so my motive is just to get it out there. But I don't want to be a jerk at all so if you'd rather me hold off, I'm okay with that too.

benhamill commented 8 years ago

@econchick: If you want this feature merged faster than we've got the bandwidth to push it to completion, then be our guest. That is probably my favorite feature of open source. If you're just squashing commits and such, they should retain attribution or whatever, which seems entirely legit. Thanks so much for all your help on this.

:sparkles: Teamwork! :sparkles:

econchick commented 8 years ago

@benhamill you're (all) awesome! I have done some work and will finish in the next day or two (yay tests!). If you have cycles to review, let me know - would love to include you. Otherwise that's cool too!

econchick commented 8 years ago

merged via #40

jdiegodcp / ramlfications

load JSON $refs #19

test_main.py

test_utils.py