zooniverse / planet-four

Identify and measure features on the surface of Mars
https://www.planetfour.org/
Apache License 2.0
2 stars 0 forks source link

Problem with x_tile, y_tile values in csv file - can't match cutout to place in full frame HiRISE image #115

Closed mschwamb closed 9 years ago

mschwamb commented 10 years ago

We have an undergraduate summer student working on computing the real positions of features in HiRISE images. I was having her now get the centers lat and lon of the cutouts we use on P4. She noticed that at least for one HiRISE image that the x_tile y_tile values from the csv file and the metadata cv file the science team were sent awhile back doesn't quite make sense.

I was told that:

x_tile and y_tile are the 1-indexed row and column numbers of the cutout: (x, y)


(1, 1) (2, 1) (3, 1) ____ ____ ____
(1, 2) (2, 2) (3, 2)
____ ____ ____
(1, 3) (2, 3) (3, 3)
____ ____ ____

Also, there is a 100 pixel overlap on the top and left of each cutout,

x and y are the coordinates of the marking local to the cutout. e.g. x pixels from the left of the cutout, y pixels from the top of the cutout image_x and image_y are the coordinates in the original image

My student looked at one cutout and realized it didn't look like the top left corner. We had an issue with the x_tile and y_tile not making sense around launch or shortly afterward and @parrish I believe fixed it. I'm not sure if this is residual, but several cutouts for a given HiRISE frame have the same x_tile and y_tile which means I can't figure out where the cutouts actually belong in the original image. I haven't checked image_x and image_y,but can someone look into this? Knowing where the cutouts go is extremely important, and the HiRISE frames are big, so it's difficult to match visually where the cutout is supposed to go by eye without a good guess first.

APFid , HiRISE frame, x_tile, y_tile (the following can't all be the top left image). We have many more examples of this. "APF00003x0","ESP_011341_0980",1,1 "APF00003yc","ESP_011341_0980",1,1 "APF00003yl","ESP_011341_0980",1,1 "APF00003z3","ESP_011341_0980",1,1 "APF00003zf","ESP_011341_0980",1,1 "APF00003zn","ESP_011341_0980",1,1

mschwamb commented 10 years ago

Example from both the weekly csv file and a metadata csv file previously sent by the development team gives the same wrong x_tile and y_tile values, but it's unclear how to back out the correct values.

APFid , HiRISE frame, x_tile, y_tile APF00001q0 ESP_011702_0985 1 1 APF00001pr ESP_011702_0985 1 1

these two images can't both be the upper left corner - for ESP_011702_0985. APF00001pr is somewhere in the middle of the image. APF00001q0 is the top left of the HiRiSE image.

Right now it means we can't compute which cutouts have overlap with other cutouts associated with HiRISE images taken of the same area but at a different time and it means we can't reconstruct the fulll HiRISE image.

We tried looking at the APF ids to see if we can gleam a pattern from the naming scheme to get the cutout order.

By eye we could confirm

APFid , HiRISE frame, x_tile, y_tile APF00001q0,ESP_011702_0985,1,1
APF00001or,ESP_011702_0985,2,1 APF00001mq,ESP_011702_0985,1,2
APF00001pq,ESP_011702_0985,2,2

so where is APF00001pr ? Can someone please take a look?

Thanks,

~Meg

mschwamb commented 10 years ago

I've confirmed in the csv file that the computed image_x and image_y values are incorrect. APF00001q0 and APF00001pr have image_x and image_y values that are on top of each other. Both claim to be in the (1,1) position and that's being used to calculate image_x and image_y so I have no idea where APF00001pr is.

mschwamb commented 9 years ago

Also - there's an email thread on the P4 mailing list in March 2013 about this issue. Though it appears to have fixed some of the cutouts but not all of them. The cutouts that were at that time reported to overlap were fixed.

parrish commented 9 years ago

@stuartlynn Did you ever have a chance to look at how the cutout row and column were assigned to the metadata?

chrissnyder commented 9 years ago

For what it's worth, this looks purely to be an incorrect setting of the metadata, ratther than a prcoessing problem.

For example, on APF00003x0, there is a "filename" metadata key that shows that subject is row 19, column 1.

Also from looking at the code, it's because the regex only matches the first character. So rather than matching "19", it matches "1" and uses that. I'll fix this for the next round, and also add the filename column onto the CSV output so tiles can be matched up for current data.

parrish commented 9 years ago

Ah, as it turns out I had fixed this around March of last year. I guess maybe data had been added since then?

Here's the code to correct it:

collection = PlanetFourSubject.collection
total = collection.count
index = 0

collection.find({ }, { fields: [:_id, :metadata, :tutorial], timeout: false }) do |cursor|
  while cursor.has_next?
    index += 1
    puts "#{ index } / #{ total }"
    doc = cursor.next_document
    next if doc['tutorial']
    filename = doc['metadata']['filename']
    x_tile = doc['metadata']['x_tile']
    y_tile = doc['metadata']['y_tile']
    row, col = filename.match(/_r(\d+)_c(\d+)/i)[1..2].collect(&:to_i)

    unless x_tile == col && y_tile == row
      puts "#{ [x_tile, y_tile].inspect } => #{ [col, row].inspect }"
      collection.update({ _id: doc['_id'] }, { :$set => { 'metadata.x_tile' => col, 'metadata.y_tile' => row } })
    end
  end
end

I'll fix the production data and clear the processed classifications.

chrissnyder commented 9 years ago

That's definitely a better solution than simply passing the buck, which is what my Ouroboros PR did.