unsplash / datasets

🎁 5,400,000+ Unsplash images made available for research and machine learning
https://unsplash.com/data
2.4k stars 117 forks source link

Values of latitude and longitude entries in dataset are swapped #63

Open ys-koshelev opened 4 months ago

ys-koshelev commented 4 months ago

Describe the bug Values of photo_location_latitude and photo_location_longitude entries in photos.tsv are swapped (both in Lite and Full versions).

To Reproduce Using a photo with id gXSFnk2a9V4 as an example (currently indexed with 1 in the Lite Dataset)

  1. Check the location listed in dataset:

    import pandas as pd
    df = pd.read_csv('photos.tsv000', sep='\t', header=0)
    print({'latitude': df.loc[1]['photo_location_latitude'], 'longitude': df.loc[1]['photo_location_longitude']})

    Which outputs {'latitude': -123.97116667, 'longitude': 45.4655}. You can already notice, that it is incorrect, since the latitude is measured within [-90, 90].

  2. Now let's verify that the values are just swapped: check location returned for the same photo by the API (Bash with curl and python3 installed):

    curl -k 'https://unsplash.com/napi/photos/aerial-photography-of-seashore-gXSFnk2a9V4' \
    -H 'Accept: */*' \
    -H 'Connection: keep-alive' \
    -H 'Sec-Fetch-Dest: empty' \
    -H 'Sec-Fetch-Mode: cors' \
    -H 'Sec-Fetch-Site: same-origin' \
    -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 Edg/124.0.0.0' \
    -H 'accept-language: en-US' \
    -H 'sec-ch-ua: "Chromium";v="124", "Microsoft Edge";v="124", "Not-A.Brand";v="99"' \
    -H 'sec-ch-ua-mobile: ?0' | \
    python3 -c "import sys, json; resp = json.load(sys.stdin); print(resp['location']['position'])"

    which outputs {'latitude': 45.4655, 'longitude': -123.97116667}.

Expected behavior The entries in the dataset should contain the correct coordinates, meaning that the values of photo_location_latitude and photo_location_longitude keys should be swapped.

Additional context N/A