konklone / jss

JSON Simple Syndication -- RSS rethought for JSON
https://github.com/konklone/jss/issues/10
6 stars 0 forks source link

Proposal 2: adding RSS elements to JSON #10

Closed konklone closed 10 years ago

konklone commented 10 years ago

Over the course of discussing #1, a different approach came up --instead of following RSS exactly and defining a container format, one could instead just augment individual JSON items with a few fields, and augment channels with a few fields.

CCing people who were involved in #1, and might find this approach as or more useful. I'd like to know if this model feels any more attractive, and represents less of a commitment or re-engineering.

Here's the simplest possible example, using two @-prefixed fields. The @ is for easier visual separation here, it could easily use - or + instead, if @ is too confusing with JSON-LD.

{
  "results": [
    {
      "agency_name": "Department of Homeland Security", 
      "file_type": "pdf", 
      "inspector_url": "http://www.oig.dhs.gov", 
      "published_on": "2014-04-01", 
      "report_id": "OIG-14-60", 
      "title": "Management Letter for the FY 2013 DHS Financial Statements and Internal Control over Financial Reporting Audit", 
      "type": "report", 
      "url": "http://www.oig.dhs.gov/assets/Mgmt/2014/OIG_14-60_Mar14.pdf", 

      "@item": {
        "id": "OIG-14-60",
        "title": "Management Letter for the FY 2013 DHS Financial Statements and Internal Control over Financial Reporting Audit",
        "description": "A report from the DHS inspector general.",
        "link": "http://www.oig.dhs.gov/assets/Mgmt/2014/OIG_14-60_Mar14.pdf",
        "published": "2014-04-01"
      }
    }
  ],
  "@channel": "results"
}

A JSON item could be given an @item field even in a non-API and non-channel context, like writing individual items to disk somewhere as part of scraper output. @channel is a related micro-standard that directs a parser to look at a particular path for the canonical array of @items.

(This @channel idea would also be extremely useful for other use cases, such as this JSON->CSV parser, which uses imperfect heuristics to guess at what that array is.)

Here's a different way of representing @item-level information:

{
  "agency_name": "Department of Homeland Security", 
  "file_type": "pdf", 
  "inspector_url": "http://www.oig.dhs.gov", 
  "type": "report", 

  "@id": "OIG-14-60",
  "@title": "Management Letter for the FY 2013 DHS Financial Statements and Internal Control over Financial Reporting Audit",
  "@description": "A report from the DHS inspector general.",
  "@link": "http://www.oig.dhs.gov/assets/Mgmt/2014/OIG_14-60_Mar14.pdf",
  "@published": "2014-04-01"
}

This approach saves on duplication of data, by eliminating the need for pre-existing fields. But this wouldn't be bolt-able-on to existing APIs, though, and I like it less aesthetically.

Here's an expanded way of representing channels:

{
  "results": [
    {
      ...
    }
  ],
  "@channel": {
    "items": "results",
    "name": "Inspector General Reports API",
    "link": "https://sunlightlabs.github.io/congress/"
  }
}

This is a little weightier, but it's useful information, that RSS currently captures. Channel-level information could be duplicated at the @item level, but that makes @items less self-contained.

With an approach like these, you could:

I recognize JSON-LD is a superset of this kind of approach, and I am open to making this work as JSON-LD if that's desired by anyone interested in shipping this sort of thing.

This is a different model than I originally proposed. Is this more attractive to people than my original proposal for an RSS-like container format?

/cc @adelevie @jpmckinney @vzvenyach @audiodude @sbma44 @mlissner

jpmckinney commented 10 years ago

At the very least replace @ with underscore _ to avoid confusion with JSON-LD. Using - or + doesn't allow for dot notation.

foo._bar // works
foo.+bar // SyntaxError
foo.-bar // SyntaxError

And even if it weren't confusing with JSON-LD:

foo.@bar // SyntaxError
bar commented 10 years ago

@jpmckinney Please don't use my name to document your code, thanks! :p

jpmckinney commented 10 years ago

You should tell GitHub that when @bar appears within a ``` code block, they shouldn't notify people unnecessarily. Sorry for the spam!

bar commented 10 years ago

Well @jpmckinney suffers the same issues, there is no need to compromise features, @ are used for notifications, but it is some good idea though :)

audiodude commented 10 years ago

I'm not sure I understand the broad utility of the @ notation. Is there something special about an @ field?

I have to say, I think the first example doesn't look like any kind of standard or any kind of RSS, it just looks like a proprietary data format for some specific application.

I'm not sure what makes this generally useful. Perhaps there should be a field that provides a link to the schema of the document so that people would get some semantic understanding of what fields like "agency_name" and "type" represent?

jpmckinney commented 10 years ago

@audiodude This (and your other issues) go back to my comments about how RSS-JSON is a poor choice for this project's name.

If the goal is to just come up with something that people in this thread and a few others can work with, we should change the name and make it clear that the project is not trying to respond to the use cases and requirements of all the people who would care about a JSON mapping of Atom/RSS (which is what the current name commits this project to).

...

I still think changing the project name to something that better represents the scope of the project will avoid unnecessary acrimony, as the current name makes the project purport to be a true translation of RSS to JSON, which is not the goal. Maybe "Scraper Syndication Specification" or "SSS".

Emphasis added.

So, yes, I agree that this spec has tons of holes if it actually wants to be a JSON version of RSS. A different name would avoid that confusion and repair the holes.

konklone commented 10 years ago

There are tons of holes, because I'm not trying to fill them yet. Locking down field names isn't interesting until the basic structure and utility are identified. I don't think those other tickets are answerable until the overall use case is clear.

But I don't (yet) think RSS-JSON is a bad choice for a name here: if the spec above were in use (with holes filled), and deployed in multiple places, I would be comfortable approaching RSS readers with the proposal to also accept JSON URLs that met the spec. It does not need to be a true translation of RSS to JSON.

That's because whether or not the above spec actually resembles RSS in structure, or has a perfect 1-1 mapping of field names, it fulfills the same purpose as RSS: really simple syndication of basic fields for published objects.

The benefit, as compared to the existing RSS spec, is twofold:

And maybe even the @channel part is useful alone, if you want to automatically "get to the array" of a JSON response. I know I've needed it in at least one other project.

I'd love to hear from others as to whether they see any value in this sort of thing.

I know that @adelevie and I have use cases for federation that go beyond the data format -- like actual pubsubhubbub-style API interaction -- but even so, this part seems interesting on its own.

jpmckinney commented 10 years ago

It's a bait and switch. You present people with "RSS-JSON." They think, "Oh, cool, a JSON representation of my RSS feed, I could use that." Then they come here, and you say, "Oh, well, actually, RSS-JSON doesn't map to RSS - it's not even the same structure - but it still does syndication! Eh? Eh?"

You already admit that RSS-JSON may have nothing to do with RSS besides syndication. A new name like "JSS" for "JSON syndication specification" retains the benefit of reminding people of RSS without indicating that it has anything to do with RSS besides the similarity of its acronym.

konklone commented 10 years ago

jss (for JSON Simple Syndication) was actually the original name for this repository, I renamed it to rss-json before filing #1. I'm totally open to that. But I think that bait-and-switch expectation is not that big of a deal -- it's RSS re-thought for JSON, not just blindly mapped to JSON. Surprised visitors would get over it. I'd still be opening tickets on RSS libraries suggesting that they add support for this spec that multiple parties are (someday, hopefully) using in production.

konklone commented 10 years ago

OK, I think you're right! I renamed the repo to jss. And I'm going to see about trying this out in production. I guess I'll go with _item and _channel to avoid JSON-LD confusion, but I like how @item and @channel look way better. :)

konklone commented 10 years ago

You know, I really like how JSON-LD does it, and @item and @channel don't conflict. I don't think it'll be confusing to use @, I think it'll be complementary.

audiodude commented 10 years ago

I still fundamentally fail to understand the utility of prefixing any fields with special characters. What are you trying to accomplish? Is this for someone who happens to open a JSS field, so that they somehow realize that @item and @channel are "special"? Are these fields special? If so, how?

jpmckinney commented 10 years ago

You can use @, but it doesn't allow dot notation. The only punctuation marks that allows dot notation are $ and _ as in foo.$bar or foo._bar. There's nothing special about one symbol versus another.

konklone commented 10 years ago

I still fundamentally fail to understand the utility of prefixing any fields with special characters. What are you trying to accomplish? Is this for someone who happens to open a JSS field, so that they somehow realize that @item and @channel are "special"? Are these fields special? If so, how?

It would be to avoid a namespace clash with any pre-existing item or channel fields someone may have on their data. (This, I assume, is the main rationale for why JSON-LD does the same thing with @context and @id.)

konklone commented 10 years ago

For reference, I added experimental RSS and "JSS" (or whatever) support to Sunlight's Congress API in May:

https://congress.api.sunlightfoundation.com/documents?apikey=[APIKEY]&document_type=ig_report&format=rss

https://congress.api.sunlightfoundation.com/documents?apikey=[APIKEY]&document_type=ig_report&format=rss-json

I documented it publicly, and you can change out which fields get mapped to title / description / etc using query string params.

It's supported for bills, votes, and upcoming votes on bills (along with a couple undocumented endpoints that Scout uses, like /documents above).

Now that I've started work on oversight.io, I'm probably going to implement something similar for everything there. But what I'm really concerned with is passing recommendations back up to official gov't providers, like the IGs who "publish" the reports I'm syndicating. Is RSS (+ pagination) enough, or should I be pushing them to something that does a simpler, better job at attaching metadata? I don't know. Either way, going to mark this as closed for the time being.