wbuchanan / StataJSON

JSON IO operations for Stata using the Jackson Java Library
http://wbuchanan.github.io/StataJSON
21 stars 6 forks source link

JSON url code resulting in -break- error #31

Closed m2hurst closed 3 years ago

m2hurst commented 6 years ago

I'm having trouble downloading Amazon API JSON data into Stata directly using the jsonio program. My data address is https://s3.amazonaws.com/ingame-prod/users/a4cdea80-1d8e-11e8-95f7-a77e031e7644/games/pga-20180325-wgc-dell-morning/score.json My Stata code is jsonio kv, file("https://s3.amazonaws.com/ingame-prod/users/a4cdea80-1d8e-11e8-95f7-a77e031e7644/games/pga-20180325-wgc-dell-morning/score.json") I keep getting a -break- error.

wbuchanan commented 6 years ago

@m2hurst Which version of JSONIO are you using? This seems very similar to something that came up in issue #23. Perhaps you might need to reinstall from the GitHub location to make sure you have the most up to date version of the program.

m2hurst commented 6 years ago

I have the most recent version. I think it has something to do with how the JSON file is written. It doesn't really lend itself to the kv structure. It imports with the rv command, but all the values are blank.

wbuchanan commented 6 years ago

@m2hurst I see what you mean now and am wondering what is happening to create the issue. I'll try to look into it later this week to see if I can figure out what is happening with it.

wbuchanan commented 6 years ago

@m2hurst Just started digging into this a bit more and the issue seems to be with the formatting of the string constructed by the webservice that is buiding the JSON. While debugging with the same end point that you provided, the Jackson JSON library threw an exception:

com.fasterxml.jackson.core.JsonParseException: Illegal character ((CTRL-CHAR, code 31)): only regular white space (\r, \n, \t) is allowed between tokens
 at [Source: (URL); line: 1, column: 2]

If you scroll down a little bit at json.org, you'll see that by definition control characters are not valid strings for JSON objects. I'll try to see if there are potential configuration flags that could be passed to Jackson to automatically handle those types of characters, but that seems to be the underlying issue.

wbuchanan commented 5 years ago

@m2hurst

It seems like the only potential way to handle the control characters would be to use the streaming architecture with Jackson. That would basically mean rewriting a non trivial portion of the code base. At this time I definitely wouldn’t feel right about trying to put any timelines on things, but in the long term I have been wanting to rewrite things to make it easier to maintain and refactor (more generally).