josephfrazier / reported-web

Web front-end for https://twitter.com/Reported_NYC: https://reported-web.herokuapp.com
https://reported-web.herokuapp.com
MIT License
10 stars 1 forks source link

Metadata extraction: take previously uploaded files into account for separately uploaded files #403

Closed josephfrazier closed 1 year ago

josephfrazier commented 1 year ago

Fixes https://github.com/josephfrazier/reported-web/issues/402:

400 fixed the main issue of extracting metadata from simultaneously uploaded files (#252), but there's a related issues, described at https://reportedcab.slack.com/archives/C9VNM3DL4/p1674696287367279?thread_ts=1673064919.656019&cid=C9VNM3DL4:

I've released a change which will try to extract the plate/location/date from all selected files simultaneously, and only warn you about the values that were not present in any of the selected files! Note that if you upload one image, then separately upload a second one, it will still warn you about each missing field in the second image, instead of also taking the first image into account. This is the way it worked before, so I'm not changing that quite yet, but I'm open to it! Thank you to $USER for reminding me about this issue!

$USER responded:

If the first image contains the plate, no need to warn about a lack of image in the second (separately uploaded) photo

I responded:

I agree, but I think it may have been done this way to avoid having to keep track of which values had already been derived on a per-image basis and ensuring that if an image with derived values is removed, it no longer "counts" towards the list of what to look for. One alternative is to simply re-extract from the first image alongside the second image, once the second image is uploaded (as if they were both uploaded together). That would repeat work that had already been done, but should involve less book-keeping than the approach in the previous paragraph. A way to avoid repeating work here is to cache the derived values on a per-image basis, perhaps using the filename as a cache key. That way, when we re-extract from a given image, we can immediately find the derived values in the cache rather than having to parse the image metadata again and hit the license plate API. I'll have to think on it a bit further, but things should be significantly better than they were before, at least for the use case of uploading all relevant images for a submission at once. Personally, I tend to do this, since the images are already beside each other on my phone, but I'd like to further accommodate a multi-upload workflow as well, just need to be thoughtful about it.

I noticed that the extractPlate method does attempt to cache license plates per-image:

      // TODO does this actually do anything? the returned result isn't used anywhere
      if (this.attachmentPlates.has(attachmentFile)) {
        const result = this.attachmentPlates.get(attachmentFile);
        return result;
      }

and I feel extractLocation and extractDate are pretty fast, so maybe we can take the re-extracting approach, once I get the extractPlate caching properly separated from the mutations it does later:

      if (
        this.state.plate === '' &&
        document.activeElement !== this.plateRef.current
      ) {
        this.setLicensePlate(result);
      }
      this.setState({
        plateSuggestion: result.plate,
      });