maps-as-data / MapReader

A computer vision pipeline for exploring and analyzing images at scale
https://mapreader.readthedocs.io/en/latest/
Other
90 stars 11 forks source link

selecting maps from corpus for testing or annotation #99

Closed kmcdono2 closed 1 year ago

kmcdono2 commented 1 year ago

Is your feature request related to a problem? Please describe. Right now, MapReader assumes you want to download map images from a tile server service based on knowledge of the coordinates contained within a map.

In more detail: download tileserver function (not it's proper name...) - option for downloading maps from queries (based on coordinates) or all. There are also index numbers which makes it seem like you can select from within an index, but this didn't seem to work (@rwood-97 tested).

However, often users want to make a selection from within a large set of maps rather than downloading a whole tile layer. There are couple of use cases for this:

Describe the solution you'd like Ability to say "I would like to select these maps with metadata A, B, C, D from tile layer for testing or for annotation_set."

Describe alternatives you've considered Two alternative ideas

Easy workaround: use geojson.io to view maps and select sheets.

Another idea: Use Observable notebook with tile layer + metadata file to interactively select maps based on certain metadata/visual features.

Additional context None at moment.

rwood-97 commented 1 year ago

Ways in which you might want to download map sheets (assuming you have relevant metadata and **in correct format***):

And, if you don't have metadata or are just using tileserver:

@kmcdono2 Please can you add thoughts

*this can actually be a big issue becuase even the one inch vs six inch metadata's are formatted differently - we'd want to make sure that what we have in the code is simple/generic enough to deal with the majority of use cases so things like 'publication date' or 'sheet no' aren't really possible.

kmcdono2 commented 1 year ago

Ways in which you might want to download map sheets (assuming you have relevant metadata and **in correct format***):

  • All map sheets in metadata
  • All map sheets overlapping with a defined polygon (area)
  • All map sheets completely within a defined polygon (area)
  • By sheet no.
  • By WFS index no. (see pic below)
  • Any map sheets that contain a single point (coordinate)

These all look good. I would add (and these can be discussed):

(I know what WFS index no you are referring to, but the pic isn't in this ticket fyi ;))

And, if you don't have metadata or are just using tileserver:

  • Download tiles within a defined polygon and merge as one map/image (this would be for non-sheet maps)

@kmcdono2 Please can you add thoughts

*this can actually be a big issue becuase even the one inch vs six inch metadata's are formatted differently - we'd want to make sure that what we have in the code is simple/generic enough to deal with the majority of use cases so things like 'publication date' or 'sheet no' aren't really possible.

On the issue of metadata structure: we def need to revisit this. I am in favor of having all metadata selection options be quite generic so we can say the input should be a metadata field that is an integer or a string etc, but that the user can pick which field is relevant for their dataset. This is definitely part of the input guidance that I can improve.

rwood-97 commented 1 year ago

These all look good. I would add (and these can be discussed):

any map sheet whose title [or other metadata field to be defined by user] contains X word(s) any map sheet intersected by a line (in a way this is to complement the point and polygon options above), but thinking particularly about tracing railspace or road development over time, this would be useful.

Yes, I like both of these. I will add