Obtain data automatically from dresden.de

psteinb commented 4 years ago

If I saw correctly, the plot on dresden.de pulls the data from a server to visualize it.

Using the firefox dev tools, I see this:

 Request URL:https://services.arcgis.com/ORpvigFPJUhb8RDF/arcgis/rest/services/corona_DD_sicht2/FeatureServer/0/query?f=json&where=Fallzahl%20IS%20NOT%20NULL&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=*&resultOffset=0&resultRecordCount=2000&cacheHint=true
Request Method:GET
Remote Address:52.222.150.203:443
Status Code:
304
Version:HTTP/2
Referrer Policy:no-referrer-when-downgrade

with the following params in the query string


f | "json"
-- | --
where | "Fallzahl IS NOT NULL"
returnGeometry | "false"
spatialRel | "esriSpatialRelIntersects"
outFields | "*"
resultOffset | "0"
resultRecordCount | "2000"
cacheHint | "true"

The response payload is this:

{"objectIdFieldName":"ObjectId","uniqueIdField":{"name":"ObjectId","isSystemMaintained":true},"globalIdFieldName":"","fields":[{"name":"Datum","type":"esriFieldTypeString","alias":"Datum","sqlType":"sqlTypeNVarchar","length":2147483647,"domain":null,"defaultValue":null},{"name":"Fallzahl","type":"esriFieldTypeInteger","alias":"Fallzahl","sqlType":"sqlTypeInteger","domain":null,"defaultValue":null},{"name":"ObjectId","type":"esriFieldTypeOID","alias":"ObjectId","sqlType":"sqlTypeInteger","domain":null,"defaultValue":null},{"name":"Sterbefall","type":"esriFieldTypeInteger","alias":"Sterbefall","sqlType":"sqlTypeOther","domain":null,"defaultValue":null},{"name":"Genesungsfall","type":"esriFieldTypeInteger","alias":"Genesungsfall","sqlType":"sqlTypeOther","domain":null,"defaultValue":null},{"name":"Anzeige_Indikator","type":"esriFieldTypeString","alias":"Anzeige_Indikator","sqlType":"sqlTypeOther","length":10,"domain":null,"defaultValue":null}],"features":[{"attributes":{"Datum":"7.03.20","Fallzahl":2,"ObjectId":1,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"08.03.20","Fallzahl":2,"ObjectId":2,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"09.03.20","Fallzahl":2,"ObjectId":3,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"10.03.20","Fallzahl":5,"ObjectId":4,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"11.03.20","Fallzahl":5,"ObjectId":5,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"12.03.20","Fallzahl":5,"ObjectId":6,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"13.03.20","Fallzahl":12,"ObjectId":7,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"14.03.20","Fallzahl":18,"ObjectId":8,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"15.03.20","Fallzahl":18,"ObjectId":9,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"16.03.20","Fallzahl":25,"ObjectId":10,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"17.03.20","Fallzahl":35,"ObjectId":11,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"18.03.20","Fallzahl":50,"ObjectId":12,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"19.03.20","Fallzahl":60,"ObjectId":13,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"20.03.20","Fallzahl":97,"ObjectId":14,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"21.03.20","Fallzahl":115,"ObjectId":15,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":null}},{"attributes":{"Datum":"22.03.20","Fallzahl":139,"ObjectId":16,"Sterbefall":null,"Genesungsfall":null,"Anzeige_Indikator":"x"}}]}

As we can see, the city of dresden has 2 more categories "Sterbefall" and "Genesungsfall".

At best, I need a script that automatically pulls this data and converts it to csv for Dresden!

psteinb commented 4 years ago

Here is the pretty printed payload from the query described above

{
    "objectIdFieldName": "ObjectId",
    "uniqueIdField": {
        "name": "ObjectId",
        "isSystemMaintained": true
    },
    "globalIdFieldName": "",
    "fields": [
        {
            "name": "Datum",
            "type": "esriFieldTypeString",
            "alias": "Datum",
            "sqlType": "sqlTypeNVarchar",
            "length": 2147483647,
            "domain": null,
            "defaultValue": null
        },
        {
            "name": "Fallzahl",
            "type": "esriFieldTypeInteger",
            "alias": "Fallzahl",
            "sqlType": "sqlTypeInteger",
            "domain": null,
            "defaultValue": null
        },
        {
            "name": "ObjectId",
            "type": "esriFieldTypeOID",
            "alias": "ObjectId",
            "sqlType": "sqlTypeInteger",
            "domain": null,
            "defaultValue": null
        },
        {
            "name": "Sterbefall",
            "type": "esriFieldTypeInteger",
            "alias": "Sterbefall",
            "sqlType": "sqlTypeOther",
            "domain": null,
            "defaultValue": null
        },
        {
            "name": "Genesungsfall",
            "type": "esriFieldTypeInteger",
            "alias": "Genesungsfall",
            "sqlType": "sqlTypeOther",
            "domain": null,
            "defaultValue": null
        },
        {
            "name": "Anzeige_Indikator",
            "type": "esriFieldTypeString",
            "alias": "Anzeige_Indikator",
            "sqlType": "sqlTypeOther",
            "length": 10,
            "domain": null,
            "defaultValue": null
        }
    ],
    "features": [
        {
            "attributes": {
                "Datum": "7.03.20",
                "Fallzahl": 2,
                "ObjectId": 1,
                "Sterbefall": null,
                "Genesungsfall": null,
                "Anzeige_Indikator": null
            }
        },
        {
            "attributes": {
                "Datum": "08.03.20",
                "Fallzahl": 2,
                "ObjectId": 2,
                "Sterbefall": null,
                "Genesungsfall": null,
                "Anzeige_Indikator": null
            }
        },
        {
            "attributes": {
                "Datum": "09.03.20",
                "Fallzahl": 2,
                "ObjectId": 3,
                "Sterbefall": null,
                "Genesungsfall": null,
                "Anzeige_Indikator": null
            }
        },
        {
            "attributes": {
                "Datum": "10.03.20",
                "Fallzahl": 5,
                "ObjectId": 4,
                "Sterbefall": null,
                "Genesungsfall": null,
                "Anzeige_Indikator": null
            }
        },
        {
            "attributes": {
                "Datum": "11.03.20",
                "Fallzahl": 5,
                "ObjectId": 5,
                "Sterbefall": null,
                "Genesungsfall": null,
                "Anzeige_Indikator": null
            }
        },
        {
            "attributes": {
                "Datum": "12.03.20",
                "Fallzahl": 5,
                "ObjectId": 6,
                "Sterbefall": null,
                "Genesungsfall": null,
                "Anzeige_Indikator": null
            }
        },
        {
            "attributes": {
                "Datum": "13.03.20",
                "Fallzahl": 12,
                "ObjectId": 7,
                "Sterbefall": null,
                "Genesungsfall": null,
                "Anzeige_Indikator": null
            }
        },
        {
            "attributes": {
                "Datum": "14.03.20",
                "Fallzahl": 18,
                "ObjectId": 8,
                "Sterbefall": null,
                "Genesungsfall": null,
                "Anzeige_Indikator": null
            }
        },
        {
            "attributes": {
                "Datum": "15.03.20",
                "Fallzahl": 18,
                "ObjectId": 9,
                "Sterbefall": null,
                "Genesungsfall": null,
                "Anzeige_Indikator": null
            }
        },
        {
            "attributes": {
                "Datum": "16.03.20",
                "Fallzahl": 25,
                "ObjectId": 10,
                "Sterbefall": null,
                "Genesungsfall": null,
                "Anzeige_Indikator": null
            }
        },
        {
            "attributes": {
                "Datum": "17.03.20",
                "Fallzahl": 35,
                "ObjectId": 11,
                "Sterbefall": null,
                "Genesungsfall": null,
                "Anzeige_Indikator": null
            }
        },
        {
            "attributes": {
                "Datum": "18.03.20",
                "Fallzahl": 50,
                "ObjectId": 12,
                "Sterbefall": null,
                "Genesungsfall": null,
                "Anzeige_Indikator": null
            }
        },
        {
            "attributes": {
                "Datum": "19.03.20",
                "Fallzahl": 60,
                "ObjectId": 13,
                "Sterbefall": null,
                "Genesungsfall": null,
                "Anzeige_Indikator": null
            }
        },
        {
            "attributes": {
                "Datum": "20.03.20",
                "Fallzahl": 97,
                "ObjectId": 14,
                "Sterbefall": null,
                "Genesungsfall": null,
                "Anzeige_Indikator": null
            }
        },
        {
            "attributes": {
                "Datum": "21.03.20",
                "Fallzahl": 115,
                "ObjectId": 15,
                "Sterbefall": null,
                "Genesungsfall": null,
                "Anzeige_Indikator": null
            }
        },
        {
            "attributes": {
                "Datum": "22.03.20",
                "Fallzahl": 139,
                "ObjectId": 16,
                "Sterbefall": null,
                "Genesungsfall": null,
                "Anzeige_Indikator": "x"
            }
        }
    ]
}

tobiashuste commented 4 years ago

I could try to have a look at this today evening. Nevertheless, need to see how fast I can move forward using R ;) Would be nice to also use GH Actions for retrieving the data and updating the plots automatically. But this is another issue.

psteinb commented 4 years ago

Sure thing. Take your time. Any contribution is welcome. No need to mingle with R if you don't want to. For this, a csv is all I need. Go for GH Actions, they should be made for this.

vv01f commented 4 years ago

solution as a shell script… also added other available numbers (deceased,recovered,hospitalized).

problem was to do the query before the one we actually want to process, otherwise status code 400 is returned (here missing authentication). as @psteinb poined out before, the queries can be spotted with the networking view in developer tools of modern browsers. in case other data should be the result of queries they can be added as well.

#!/usr/bin/env sh
assert_tools () {
    err=0
    while test $# -gt 0; do
        which $1  1>/dev/null 2>&1 || {
            >&2 printf "tool missing: $1"
            err=$(( $err + 1 ))
        }
        shift
    done
    test $err -eq 0 || exit $err
}

# test for tools used, sorted: likely to unlikely
dependencies="printf cat cut rev test date sleep curl jq"
assert_tools $dependencies

# for calling existing file, fallback test case
test -z $1 && fn="g.json" || fn="$1"
# could test for existing file

# disable query new file with cli option
test "$2" = "noquery" || {

# user agent
a="Mozilla/5.0 (X11; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"
# 1. Request
url1="https://services.arcgis.com/ORpvigFPJUhb8RDF/arcgis/rest/services/corona_DD_3/FeatureServer/0/query?f=json&where=Anzeige_Indikator%3D%27x%27&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=*&resultOffset=0&resultRecordCount=50&cacheHint=true"
# 2. Request, will nicht vor 1. => Satus Code: 400
url2="https://services.arcgis.com/ORpvigFPJUhb8RDF/arcgis/rest/services/corona_DD_3/FeatureServer/0/query?f=json&where=Fallzahl%20IS%20NOT%20NULL&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=*&resultOffset=0&resultRecordCount=2000&cacheHint=true"

# alternative tool: wget
# here call is silent, without checks on certificates and with custom user agent (browser name/version etc.)
curl -s -k -A "$a" "$url1" 2>&1 1>/dev/null
# wait a moment to not trigger some protection on server side
sleep 1 # although this doesnt seem to be necessarry at all
# write actual content to file $fn
curl -s -o "$fn" -k -A "$a" "$url2"

}

# information on file names, print to stderr 
1>&2 echo "Reading from: $fn"
test -f "$fn" || { 1>&2 echo "file does not exist."; exit 1; }
tf=$(date +%Y-%m-%dT%H:%M:%S)"_"$(echo "$fn"|rev|cut -d. -f2-|rev)".csv"
1>&2 echo "Writing to: $tf"

# check for file extension
test "$(echo "$fn"|rev|cut -d. -f1|rev)" = "json" && {

# head line as found in csv of the repo
1>"$tf" echo "city,date,tod_hhmm,diagnosed,deceased,recovered,hospitalized"
# csv header as used in R script
#~ 1>"$tf" echo "city,date,tod_hhmm,diagnosed"

# following values are assumed constant
# standard values for place and time
p="Dresden"
c="12:00" # time of date, format H:M

# process lines (each day) of the files, filter json with `jq`
for i in $(cat $fn | jq -c '.features[] .attributes | { Datum, Fallzahl, Sterbefall, Genesungsfall, Hospitalisierung }');
do
    # numbers, may need fix in case NAN
    # n: number of cases
    n=$(echo $i|cut -d} -f1|cut -d, -f2|cut -d: -f2); test "$n" = "null" && n="0"
    # b: deceased
    b=$(echo $i|cut -d} -f1|cut -d, -f3|cut -d: -f2); test "$b" = "null" && b="0"
    # r: recovered
    r=$(echo $i|cut -d} -f1|cut -d, -f4|cut -d: -f2); test "$r" = "null" && r="0"
    # h: hospitalized
    h=$(echo $i|cut -d} -f1|cut -d, -f5|cut -d: -f2); test "$h" = "null" && h="0"
    # t: timestamp
    t=$(echo $i|cut -d} -f1|cut -d, -f1|cut -d: -f2|cut -d'"' -f2)
    # split timestamp in parts, if leading zero, reduce to single digits
    # d: day
    d=$(echo $t|cut -d"." -f1); test "$(echo $d|cut -c1)" = "0" && d=$(echo $d|cut -c2)
    # m: month
    m=$(echo $t|cut -d"." -f2); test "$(echo $m|cut -c1)" = "0" && m=$(echo $m|cut -c2)
    # y: year
    y=$(echo $t|cut -d"." -f3-); test "$(echo $y|cut -c1)" = "0" && y=$(echo $y|cut -c2)
    # fix year to 4 digits
    test "$y" -lt "100" && y=$(( 2000 + $y ));

    # output for csv
    # including more numbers
    1>>"$tf" printf "%s,%04d-%02d-%02d,%s,%s,%s,%s,%s\n" "$p" "$y" "$m" "$d" "$c" "$n" "$b" "$r" "$h"
    # csv as used by R script, number infected only
    #~ 1>>"$tf" printf "%s,%04d-%02d-%02d,%s,%s\n" "$p" "$y" "$m" "$d" "$c" "$n"

done

}

psteinb commented 4 years ago

@vv01f Dresden just upgraded their data exposure. Would your code be capable of handling that?

vv01f commented 4 years ago

@psteinb as it pulles the unchanged JSON (keywords did not change, the new ones have been added already) it continues to work.

current result:

city,date,tod_hhmm,diagnosed,deceased,recovered,hospitalized
Dresden,2020-03-07,12:00,2,0,0,0
Dresden,2020-03-08,12:00,2,0,0,0
Dresden,2020-03-09,12:00,2,0,0,0
Dresden,2020-03-10,12:00,5,0,0,0
Dresden,2020-03-11,12:00,5,0,0,2
Dresden,2020-03-12,12:00,5,0,0,2
Dresden,2020-03-13,12:00,12,0,0,2
Dresden,2020-03-14,12:00,18,0,0,2
Dresden,2020-03-15,12:00,18,0,0,3
Dresden,2020-03-16,12:00,25,0,0,4
Dresden,2020-03-17,12:00,35,0,0,5
Dresden,2020-03-18,12:00,50,0,0,6
Dresden,2020-03-19,12:00,60,0,0,8
Dresden,2020-03-20,12:00,97,0,0,9
Dresden,2020-03-21,12:00,115,0,0,10
Dresden,2020-03-22,12:00,139,0,0,11
Dresden,2020-03-23,12:00,154,1,0,14
Dresden,2020-03-24,12:00,167,2,0,17
Dresden,2020-03-25,12:00,216,2,0,17
Dresden,2020-03-26,12:00,239,2,0,17
Dresden,2020-03-27,12:00,287,2,0,23
Dresden,2020-03-28,12:00,314,2,0,25
Dresden,2020-03-29,12:00,320,2,0,28
Dresden,2020-03-30,12:00,339,3,0,37
Dresden,2020-03-31,12:00,358,3,0,43

vv01f commented 4 years ago

@psteinb I added it as a gh action to see here… https://github.com/vv01f/covid19-curve-your-city/runs/548655100?check_suite_focus=true file content is shown in the log.

psteinb commented 4 years ago

Super Cool, I'd love to merge this into this repo and pull the data every afternoon. I suggest to store the csv inside

/repo-root/data/de_dresden_www.csv

I am unfamiliar with gh actions. Not sure if you could send a PR and I'll merge that?

vv01f commented 4 years ago

that makes the two of us. I will try to find out how this works…

[x] the schedule and
[x] artifacts I found. just not sure it does the thing you need. basically within those steps of the workflow the file can be used (e.g. its contents shown) by the next step without any artifact. artifacts are not stored within the repo~~, neither is the CI description, so they are not subject to any PR it seams to me~~.

vv01f commented 4 years ago

better idea: commit from within the ci PR sent.

psteinb commented 4 years ago

The data came in perfect today. Thanks for this wonderful PR. Now I don't have to deal with this. Much appreciated. ;)

psteinb commented 4 years ago

Closed by #19

vv01f commented 4 years ago

today no data available in between noon or 20 past. I expect that to change in 5h. right now logging every 5min to determine when they publish. then an additional schedule for saturday (and maybe sunday) can be added.

psteinb commented 4 years ago

it's totally fine as it is. I am super thankful for this wonderful contribution.

psteinb / covid19-curve-your-city

Obtain data automatically from dresden.de #6