mholt / timeliner

All your digital life on a single timeline, stored locally -- DEPRECATED, SEE TIMELINIZE (link below)
https://timelinize.com
GNU Affero General Public License v3.0
3.57k stars 116 forks source link

Twitter: "Cannot unmarshal number into Go struct field tweetGeo.coordinates of type string" #25

Closed jacroe closed 4 years ago

jacroe commented 5 years ago

[ERROR][twitter/jacroe] Downloading all: getting items from service: getting next page of tweets: reading response body: json: cannot unmarshal number into Go struct field tweetGeo.coordinates of type string

Unsure what causes this error; this is just immediately after I run timeliner get-all twitter/jacroe.

mholt commented 5 years ago

Oh interesting, I haven't encountered the coordinates in real life yet (I never geotag my tweets)!

It looks like coordinates are being returned as a number but, the docs I think (since that's where I got the structure from), said it would be a struct.

Do you happen to have the ability to help debug Go code?

jacroe commented 5 years ago

I've not done debugging with Go before (I've not done much of anything with Go, tbh) but have plenty of experience working through other languages. I'll see what I can come up with this weekend, maybe.

jacroe commented 5 years ago

I've done a bit of research and testing and it looks like there's a type error at . Twitter's data dictionary for coordinates are floats with the longitude first. Which in the repo it's set to strings with latitude first.

Changing the tweetGeo struct to:

type tweetGeo struct {
    Type        string   `json:"type"`
    Coordinates []float64 `json:"coordinates"` // "longitude first, then latitude"
}

seemed to have fixed it. I'll submit a pull request, but depending on how, or if, you're dealing with coordinates in the database, more work might need to be done.

mholt commented 5 years ago

Doesn't that contradict these docs here?

GeoJSON specifies a longitude then a latitude, whereas Twitter represents it as a latitude then a longitude: "geo": { "type":"Point", "coordinates":[37.78217, -122.40062] }

(Unless "geo" and "coordinates" are reversed, which is dumb if so.)

Repeated here too:

The JSON response mostly uses conventions laid out in GeoJSON. The coordinates that Twitter renders are reversed from the GeoJSON specification (GeoJSON specifies a longitude then a latitude, whereas Twitter represents it as a latitude then a longitude), eg: "geo": { "type":"Point", "coordinates":[37.78029, -122.39697] }

jacroe commented 5 years ago

Wow, that's annoying and confusing. Because there's also this doc here which says "longitude, then latitude"

coordinates - Nullable. Represents the geographic location of this Tweet as reported by the user or client application. The inner coordinates array is formatted as geoJSON (longitude first, then latitude).

Regardless, it does look like they are floats and not strings. Seems like all pages at least agree on that.

jacroe commented 5 years ago

What I should do is just do an actual API request to see in what order it returns it. I can't tonight, but I'll add that to my list to do for tomorrow.

mholt commented 5 years ago

The type change sounds right, yeah. Thank you for digging in more!

mholt commented 5 years ago

Any followup on this? :)

jacroe commented 5 years ago

I honestly didn't expect any more past the pull request. I modified and built it and ran the command, and it seemed to work just fine...

But never fear! I'll double check the downloaded tweets and API and ensure that whatever lat/long is provided will be in the proper order. Look for something maybe this weekend?

mholt commented 5 years ago

Sure! yeah, just make sure the coordinates are correct - beyond that I don't care which fields they come from (until further light can be shed on the situation).

mholt commented 5 years ago

@jacroe Any interest in finishing the investigation and updating the PR?

joonas-fi commented 5 years ago

@mholt I got bit by this as well, though with slightly different error message:

$ timeliner -twitter-replies -twitter-retweets get-latest twitter/joonas_fi
2019/11/27 15:28:43 [ERROR][twitter/joonas_fi] Getting latest: getting items from service: processing tweet from API: processing tweet 1182200778943074305: preparing reply-parent tweet: looking up tweet author's account information: reading response body: json: cannot unmarshal number into Go struct field tweetGeo.status.geo.coordinates of type string

Let me know if I can help - I'm very comfortable with Go code, but I'm slightly confused on the status of this issue.

mholt commented 5 years ago

@joonas-fi Oh, excellent -- I would love your help. First, we just need to confirm what the response actually is. It's apparently being returned as a number and not a string like I thought, but we need to find out if it's "lat, lon" or "lon, lat". We can't trust the docs, so you'll have to investigate manually. Probably includes plotting coordinates on a map so you know it's the right way.

If you can confirm which way it is, then I'd happily accept a PR!

joonas-fi commented 5 years ago

I plopped in a io.TeeReader in twitter/api.go so I could add the whole read JSON body to the error message.. here's the new error message:

2019/11/29 09:33:35 [ERROR][twitter/joonas_fi] Getting latest: getting items from service: processing tweet from API: processing tweet 1182200778943074305: preparing reply-parent tweet: looking up tweet author's account information: reading response body: json: cannot unmarshal number into Go struct field tweetGeo.status.geo.coordinates of type string, body was: {"id":1777771,"id_str":"1777771","name":"barry","screen_name":"mullingitover","location":"In your hearts and minds","profile_location":null,"description":"i am the what i was of what i will be.","url":"https:\/\/t.co\/h2RZhHbbiz","entities":{"url":{"urls":[{"url":"https:\/\/t.co\/h2RZhHbbiz","expanded_url":"http:\/\/Instagram.com\/mulling_it_over","display_url":"Instagram.com\/mulling_it_over","indices":[0,23]}]},"description":{"urls":[]}},"protected":false,"followers_count":161,"friends_count":280,"listed_count":8,"created_at":"Wed Mar 21 18:13:14 +0000 2007","favourites_count":2279,"utc_offset":null,"time_zone":null,"geo_enabled":true,"verified":false,"statuses_count":1729,"lang":null,"status":{"created_at":"Wed Nov 27 18:54:49 +0000 2019","id":1199763524031041537,"id_str":"1199763524031041537","text":"#tfti_la with brinickijay @ Los Angeles, California https:\/\/t.co\/B09wedL5vb","truncated":false,"entities":{"hashtags":[{"text":"tfti_la","indices":[0,8]}],"symbols":[],"user_mentions":[],"urls":[{"url":"https:\/\/t.co\/B09wedL5vb","expanded_url":"https:\/\/www.instagram.com\/p\/B5YTJBsDk7e\/?igshid=1xzy2771uwsg9","display_url":"instagram.com\/p\/B5YTJBsDk7e\/\u2026","indices":[52,75]}]},"source":"\u003ca href=\"http:\/\/instagram.com\" rel=\"nofollow\"\u003eInstagram\u003c\/a\u003e","in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"geo":{"type":"Point","coordinates":[34.0522,-118.243]},"coordinates":{"type":"Point","coordinates":[-118.243,34.0522]},"place":{"id":"3b77caf94bfc81fe","url":"https:\/\/api.twitter.com\/1.1\/geo\/id\/3b77caf94bfc81fe.json","place_type":"city","name":"Los Angeles","full_name":"Los Angeles, CA","country_code":"US","country":"Yhdysvallat","contained_within":[],"bounding_box":{"type":"Polygon","coordinates":[[[-118.668404,33.704538],[-118.155409,33.704538],[-118.155409,34.337041],[-118.668404,34.337041]]]},"attributes":{}},"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,"favorited":false,"retweeted":false,"possibly_sensitive":false,"lang":"en"},"contributors_enabled":false,"is_translator":false,"is_translation_enabled":false,"profile_background_color":"FFFFFF","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/923335960007340032\/pIbUjNkC_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/923335960007340032\/pIbUjNkC_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/1777771\/1508975481","profile_link_color":"0012BB","profile_sidebar_border_color":"AAAAAA","profile_sidebar_fill_color":"FFFFFF","profile_text_color":"000000","profile_use_background_image":false,"has_extended_profile":false,"default_profile":false,"default_profile_image":false,"can_media_tag":null,"followed_by":null,"following":null,"follow_request_sent":null,"notifications":null,"translator_type":"none"}

It seems that status.geo.coordinates and status.coordinates.coordinates have lat,lng swapped:

image

🤯

Speaking in latitude, longitude:

34.0522, -118.243 is in California -118.243, 34.0522 isn't valid because latitude ranges from -90° to +90°. Longitude ranges from -180° to +180°

Therefore:

In the model these both are spec'd as *tweetGeo, but since their lat,lng order is different they shouldn't use the same structure. Could we just remove the Geo field since it's marked as deprecated?

If tweetGeo is left for only Coordinates field's use, then we need to update the comment to // "longitude, then a latitude". Though I think considering the unconventional order it'd be best to just make Latitude(), Longitude() getters for the struct.

Also, the string should probably be changed to float64, or whatever the best practice is for handling WGS84 coordinates.. I doubt not many people use math/big to ensure the precision doesn't get mangled, since the sending system probably stored it as float64 anyway..

mholt commented 5 years ago

Wow, thank you!! Yes yes yes to all of that. I like all of what you said. That's a good plan. Can you submit a PR?

joonas-fi commented 5 years ago

Glad I could be of help :) Yeah, I'll submit a PR. Maybe tomorrow.

Would you like me to add test for "kitchen sink" Twitter API response in a test, i.e. if the JSON structure worked for you because you didn't happen to come across geotags, we should have the most complicated case incl. geotags of raw JSON bytes in a test to make sure all of it gets decoded correctly?

mholt commented 5 years ago

Sure! If that's not too much work, that'd be a good idea.