svsticky / chroma

Manage photo albums on S3 buckets. Successor to Pxl and Rstr
1 stars 1 forks source link

Migration from `pxl` #3

Closed Riscky closed 10 months ago

Riscky commented 1 year ago

Currently, the association's pictures reside in pxl. To start using chroma, we want to migrate the existing database. Pxl's state resides in a JSON file, with the following schema:

schema ```json { "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "albums": { "type": "array", "items": [ { "type": "object", "properties": { "images": { "type": "array", "items": [ { "type": "object", "properties": { "remote_uuid": { "type": "string" }, "available_sizes": { "type": "array", "items": [ { "type": "string" } ] } }, "required": [ "remote_uuid", "available_sizes" ] } ] }, "name_nav": { "type": "string" }, "name_display": { "type": "string" }, "created": { "type": "string" } }, "required": [ "images", "name_nav", "name_display", "created" ] } ] } }, "required": [ "albums" ] } ```

I think this maps quite cleanly to the chroma database schema, but there are a few caveats:

I think we can write a simple python script to get the pxl state migrated to chroma's schema, and then download, batch rename and reupload (to a new bucket) the files. We can test all of that locally before shipping a database backup to our production server. Why download and reupload? We want to migrate from the 'S3 compatible' Digital Ocean Spaces to actual S3, and I don't think we want to clog the existing bucket with more files. It's also easier to test this way, and we can perform the batch rename with an existing tool instead of writing yet another script.

I think someone from @svsticky/it-crowd should pick this up, as they have access to the production buckets.

SilasPeters commented 10 months ago

I would like to add two things:

  1. The real creation date of the image itself might be stored in the image itself, as all image file types are containers. We can't rely on this too hard since this date is not forced to be included in the container metadata. However, this might be an interesting take on filling the creation date fields.
  2. If only J(E)PG files are supported, this does not allow .CER (RAW) images to be uploaded. (This is not that relevant however, as RAW pictures are quite large and unprocessed.) Also, not every camera or phone creates JPGs.
HugoPeters1024 commented 10 months ago

Nice to you looking around here! Does this mean that the chroma migrations is going to happen soon (tm)? Let me respond to your two things:

  1. The real creation date of the image itself might be stored in the image itself, as all image file types are containers.

pxl purposefully strips all metadata from the images for privacy reasons, so the create time will definitely not be available on the image itself.

If only J(E)PG files are supported, this does not allow .CER (RAW) images to be uploaded. (This is not that relevant however, as RAW pictures are quite large and unprocessed.) Also, not every camera or phone creates JPGs.

We definitely don't want RAW images on our storage. I have never heard the jpg requirement to be a problem, but I have no problem lifting that if chroma supports it. I think the reasoning was that JPGs usually have a reasonable file size. PNG can blow up pretty quick for example.

SilasPeters commented 10 months ago

Haha well converting JSONs to some other filetype does seem like something I could do, so it certainly has my interest. I will discuss this during the first CommIT meeting, to coordinate our collaboration.

TobiasDeBruijn commented 10 months ago

Been a bit, just looking into it now again.

pxl purposefully strips all metadata from the images for privacy reasons, so the create time will definitely not be available on the image itself.

Chroma doesn't do this, do we want something like this?

Regarding the file format limitations, Chroma currently supports JPEG and PNG, but converts them to PNG behind the scenes. Currently it does this synchronously, meaning that if you upload a large amount of JPEGs, you might wait a bit longer (TODO).

I can support all formats defined in ImageFormat quite easily I think. Which ones do we want for sure?

HugoPeters1024 commented 10 months ago

Chroma doesn't do this, do we want something like this?

I think it would be preferable. Images made on people's phone can contain a lot of extra information like location data or hardware specs. Ideally we don't store any information we don't use.

Regarding the file format limitations, Chroma currently supports JPEG and PNG, but converts them to PNG behind the scenes.

AFAIK pngs take up a lot more space because they use less compression. I therefore don't see a reason why you would convert a jpeg to a png (you are only adding data, not information). But maybe I'm missing something?

Riscky commented 10 months ago

Images made on people's phone can contain a lot of extra information like location data or hardware specs. Ideally we don't store any information we don't use.

Indeed, image metadata can easily be classified as 'personally identifiable information' under GDPR. A photo website is already a bit of GDPR hassle (because photo's of people are per definition PII), but let's prevent making it more of an hassle.

TobiasDeBruijn commented 10 months ago

AFAIK pngs take up a lot more space because they use less compression. I therefore don't see a reason why you would convert a jpeg to a png (you are only adding data, not information). But maybe I'm missing something? Good point indeed. Photos taken with a camera don't need the transparency offered by PNG.

I'm considering using neither, and using WebP instead.

SilasPeters commented 10 months ago

As long as the original jpg can still be downloaded without loss of quality

TobiasDeBruijn commented 10 months ago

I've currently implemented it in #10 there is no such option. Though the image can be converted back, without loss of quality.

Edit: The server does this for you, if you specify the format query parameter when requesting an individual image.

TobiasDeBruijn commented 10 months ago

Indeed, image metadata can easily be classified as 'personally identifiable information' under GDPR. A photo website is already a bit of GDPR hassle (because photo's of people are per definition PII), but let's prevent making it more of an hassle.

Also addressed by #10

SilasPeters commented 10 months ago

I assume that WebP comes with quality loss, is that correct? How else are the image sizes less? It would seem important to me personally that if someone really likes a photo that they can get the original quality version

TobiasDeBruijn commented 10 months ago

Nope! WebP is lossless, I've currently set it to maintain 100% of the quality.

SilasPeters commented 10 months ago

Great! In #7 I need to fetch the original quality picture and display it, this will work seamlesly then?

TobiasDeBruijn commented 10 months ago

Yes! If you don't provide the format query parameter, it'll default to converting to PNG. Though preferably use format=WebP (capitalization matters!). That saves some processing on the server, and is natively supported by browsers too.