Open jklundell opened 7 years ago
Hi Jonathan, thanks for the suggestion!, do you mean something like: "7150 MAGICAL PEARL - Maluku, Indonesia" ---> "-6.7184,129.5080" "SUBJECT WANTED 4755 - Nebraska, United States" ---> "41.4733,-99.5598" "40.4733,-86.5598" --> "4755 NEXT SUBJECT - Indiana, United States"
@roberdam the format you mentioned would be extremely helpful to me.
I am currently porting xaddress to rust with intention of making it efficient to run offline on mobile devices/terminals/wasm/etc.
What would be most helpful is something like (pseudocode)
generate_n_samples(n):
loop n times:
# Pick either a random state, or a random country with no states.
# Example (pick state)
random_state <- random(state_list)
country <- random_state.country
# Pick something you know is in the bounds of this state/country
random_lat <- random(state_bounds.south .. state_bounds.north)
random_lon <- random(state_bounds.west .. state_bounds.east)
# Now, you don't need to query google in order to encode!
raw_coords <- "{random_lat},{random_lon}"
encoded_string, encoded_icon <- encode(random_lat, random_lon, country, random_state)
decoded_string <- decode(encoded_string)
# Save the mapping to a csv
output_file.write <- 'raw_coords,"{encoded_string}",{encoded_icon},{decoded_string}'
You wouldn't have to reverse direction. One record maps coords to both encoded and decoded values (raw coords might have more decimal places than decoded_string, that's why decoded_string is a third column)
This way I could generate a ton (1GB+) of samples, without having to query per sample. It would also be good to perf test and accuracy-test the port. I would do this but I would have to learn more ruby.
There ought to be a collection of test cases suitable for regression-testing both encoding and decoding and for validation of new implementations.