roberdam / Xaddress

Xaddress - Give 7 billion people an instant physical address
http://xaddress.org
MIT License
1.19k stars 46 forks source link

Regression Tests #24

Open jklundell opened 7 years ago

jklundell commented 7 years ago

There ought to be a collection of test cases suitable for regression-testing both encoding and decoding and for validation of new implementations.

roberdam commented 7 years ago

Hi Jonathan, thanks for the suggestion!, do you mean something like: "7150 MAGICAL PEARL - Maluku, Indonesia" ---> "-6.7184,129.5080" "SUBJECT WANTED 4755 - Nebraska, United States" ---> "41.4733,-99.5598" "40.4733,-86.5598" --> "4755 NEXT SUBJECT - Indiana, United States"

austenadler commented 1 year ago

@roberdam the format you mentioned would be extremely helpful to me.

I am currently porting xaddress to rust with intention of making it efficient to run offline on mobile devices/terminals/wasm/etc.

What would be most helpful is something like (pseudocode)

generate_n_samples(n):
    loop n times:
        # Pick either a random state, or a random country with no states.
        # Example (pick state)
        random_state <- random(state_list)
        country <- random_state.country

        # Pick something you know is in the bounds of this state/country
        random_lat <- random(state_bounds.south .. state_bounds.north)
        random_lon <- random(state_bounds.west .. state_bounds.east)

        # Now, you don't need to query google in order to encode!
        raw_coords <- "{random_lat},{random_lon}"
        encoded_string, encoded_icon <- encode(random_lat, random_lon, country, random_state)
        decoded_string <- decode(encoded_string)

        # Save the mapping to a csv
        output_file.write <- 'raw_coords,"{encoded_string}",{encoded_icon},{decoded_string}'

You wouldn't have to reverse direction. One record maps coords to both encoded and decoded values (raw coords might have more decimal places than decoded_string, that's why decoded_string is a third column)

This way I could generate a ton (1GB+) of samples, without having to query per sample. It would also be good to perf test and accuracy-test the port. I would do this but I would have to learn more ruby.