mozilla-services / crashstats-tools

Command line tools and library for interacting with Crash Stats (https://crash-stats.mozilla.org/)
Mozilla Public License 2.0
9 stars 0 forks source link

fix fetch-data layout so it matches socorro's fetch_crash_data command #66

Closed willkg closed 1 year ago

willkg commented 1 year ago

crashstats-tools has a set of commands that have counterparts in the Socorro project. It'd be great to (at some point) not have two sets of commands. Towards that, I've been reducing the differences between them. Eventually, I'll drop one in favor of the other and only have to maintain one.

One of the differences is how fetch-data saves crash data in a directory tree. It saves data in a tree like this:

savedir/
   - raw_crash/
      - CRASH_ID
   - processed_crash/
      - CRASH_ID
   - dump_names/
      - CRASH_ID
   - dump/
      - CRASH_ID

Whereas the Socorro fetch_crash_data command saves it in a tree that matches the AWS S3 directory structure:

savedir/
   - raw_crash/
      - DATE
         - CRASH_ID
   - processed_crash/
      - CRASH_ID
   - dump_names/
      - CRASH_ID
   - dump/
      - CRASH_ID

One nice thing about fetch_crash_data is that you can more easily use it in Socorro development. However, most people in the world are not doing Socorro development, so the flatter directory structure is easier to do things with.

One thing we could do here is to make the two do the same thing. That feels weird because it feels like we're exposing an implementation detail of the AWS S3 crash storage of the Socorro processor. It doesn't change often, but maybe we use a different layout in GCP GCS. What do we do then?

Another thing we could do is add a flag to denote which layout to use. That feels weird for the same reasons.

Another thing we could do is add another command that converts between the layouts. That feels less weird, but adds another step for Socorro developers.

I think I may go with that third one and make another command that switches between the layouts.

(Weirdly, "socorro developers", "socorro maintainer", and "crashstats-tools maintainer" are all me. So I'm really just trying to figure out how to do less work here.)

willkg commented 1 year ago

I thought about this a while and I'm going to change it to what I need for Socorro development. That's the easier-to-maintain path for me. If there's a real need for the flatter directory for raw crash data, then I'm game for talking with whoever has that need and figuring out what to do then.