mozilla / publish.webmaker.org

The teach.org publishing service for goggles and thimble
Mozilla Public License 2.0
16 stars 38 forks source link

Store tarballs instead of individual files on publish #148

Closed humphd closed 9 years ago

humphd commented 9 years ago

When we publish, we store a snapshot of the project's files in the db (vanilla copy, no injection of our stuff, ready for remix), and also place these on S3 (with an additional header for remix bar). When a user remixes a project, we read all the published files from the db, create a tarball, and stream that to the browser. For a project like the "Keep Calm" poster on the front page, we're doing this over, and over, and over; in every case, we're wasting a ton of cycles and memory to create the exact same tarball.

When we originally created the publish server, we didn't use tarballs yet, so storing expanded file trees made sense. However, now that we consume tarballs, it doesn't make sense. A better, more efficient solution would be to pre-generate and store the project tarball only.

One thing I'm not sure about is whether this change would impact other potential users of publish, for example Goggles: I'm not clear whether Goggles needs something you can re-open (i.e., original file in the db), or just something that lives on S3. I'm also unclear on whether Goggles even has plans to use publish. cc @Pomax.

Doing this will also require migrations for existing staging and prod dbs.

Related to #145.

Pomax commented 9 years ago

Goggles will be using publish.wmo, but will still be a "publish and forget" concept, so there's no storing "original" data, or filetrees. Only final, read-only data.