Closed Riscky closed 10 months ago
I would like to add two things:
Nice to you looking around here! Does this mean that the chroma migrations is going to happen soon (tm)? Let me respond to your two things:
- The real creation date of the image itself might be stored in the image itself, as all image file types are containers.
pxl purposefully strips all metadata from the images for privacy reasons, so the create time will definitely not be available on the image itself.
If only J(E)PG files are supported, this does not allow .CER (RAW) images to be uploaded. (This is not that relevant however, as RAW pictures are quite large and unprocessed.) Also, not every camera or phone creates JPGs.
We definitely don't want RAW images on our storage. I have never heard the jpg requirement to be a problem, but I have no problem lifting that if chroma supports it. I think the reasoning was that JPGs usually have a reasonable file size. PNG can blow up pretty quick for example.
Haha well converting JSONs to some other filetype does seem like something I could do, so it certainly has my interest. I will discuss this during the first CommIT meeting, to coordinate our collaboration.
Been a bit, just looking into it now again.
pxl purposefully strips all metadata from the images for privacy reasons, so the create time will definitely not be available on the image itself.
Chroma doesn't do this, do we want something like this?
Regarding the file format limitations, Chroma currently supports JPEG and PNG, but converts them to PNG behind the scenes. Currently it does this synchronously, meaning that if you upload a large amount of JPEGs, you might wait a bit longer (TODO).
I can support all formats defined in ImageFormat quite easily I think. Which ones do we want for sure?
Chroma doesn't do this, do we want something like this?
I think it would be preferable. Images made on people's phone can contain a lot of extra information like location data or hardware specs. Ideally we don't store any information we don't use.
Regarding the file format limitations, Chroma currently supports JPEG and PNG, but converts them to PNG behind the scenes.
AFAIK pngs take up a lot more space because they use less compression. I therefore don't see a reason why you would convert a jpeg to a png (you are only adding data, not information). But maybe I'm missing something?
Images made on people's phone can contain a lot of extra information like location data or hardware specs. Ideally we don't store any information we don't use.
Indeed, image metadata can easily be classified as 'personally identifiable information' under GDPR. A photo website is already a bit of GDPR hassle (because photo's of people are per definition PII), but let's prevent making it more of an hassle.
AFAIK pngs take up a lot more space because they use less compression. I therefore don't see a reason why you would convert a jpeg to a png (you are only adding data, not information). But maybe I'm missing something? Good point indeed. Photos taken with a camera don't need the transparency offered by PNG.
I'm considering using neither, and using WebP instead.
As long as the original jpg can still be downloaded without loss of quality
I've currently implemented it in #10 there is no such option. Though the image can be converted back, without loss of quality.
Edit: The server does this for you, if you specify the format
query parameter when requesting an individual image.
Indeed, image metadata can easily be classified as 'personally identifiable information' under GDPR. A photo website is already a bit of GDPR hassle (because photo's of people are per definition PII), but let's prevent making it more of an hassle.
Also addressed by #10
I assume that WebP comes with quality loss, is that correct? How else are the image sizes less? It would seem important to me personally that if someone really likes a photo that they can get the original quality version
Nope! WebP is lossless, I've currently set it to maintain 100% of the quality.
Great! In #7 I need to fetch the original quality picture and display it, this will work seamlesly then?
Yes! If you don't provide the format
query parameter, it'll default to converting to PNG. Though preferably use format=WebP
(capitalization matters!). That saves some processing on the server, and is natively supported by browsers too.
Currently, the association's pictures reside in
pxl
. To start usingchroma
, we want to migrate the existing database. Pxl's state resides in a JSON file, with the following schema:schema
```json { "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "albums": { "type": "array", "items": [ { "type": "object", "properties": { "images": { "type": "array", "items": [ { "type": "object", "properties": { "remote_uuid": { "type": "string" }, "available_sizes": { "type": "array", "items": [ { "type": "string" } ] } }, "required": [ "remote_uuid", "available_sizes" ] } ] }, "name_nav": { "type": "string" }, "name_display": { "type": "string" }, "created": { "type": "string" } }, "required": [ "images", "name_nav", "name_display", "created" ] } ] } }, "required": [ "albums" ] } ```I think this maps quite cleanly to the
chroma
database schema, but there are a few caveats:pxl
don't have acreated
property. We can use the albumcreated
property instead. The schema doesn't show it, but thecreated
property is a ISO 8601 datetime string, like2019-02-21T22:32:04
. The datetime string doesn't include a timezone however, but since thecreated
property is the upload time, not the time the event happened, I don't think it matters much. Let's assume it is UTC.id
of the image. However, the filename on S3 doesn't completely reflect this UUID, as it contains slashes, a size indicator, and a file extension. For example, the images withremote_uuid
f774b64f6d8e4dcbb5d6f352c2a0b1ec
has the filenamef774b64f-6d8e-4dcb-b5d6-f352c2a0b1ec_o.jpg
in the bucket. (Yeah, I don't know about the dashes either). It is relatively easy however to batch rename the files in the bucket to not contain dashes, size indicators and file extensions, so we should probably do just that.pxl
images have multiple sizes. Original (indicated with_o
) and two thumbnail sizes,_w1600
and_w400
. I think we want to have thumbnails inchroma
too some time in the future (size matters), so it might be useful to keep these around for when we get around to doing thumbnails. Would be a waste to regenerate all of these.pxl
images always have file extensionjpg
(see https://github.com/svsticky/pxl/issues/60). That makes life easier, but got me thinking: shouldchroma
support multiple file types in the future?I think we can write a simple python script to get the
pxl
state migrated tochroma
's schema, and then download, batch rename and reupload (to a new bucket) the files. We can test all of that locally before shipping a database backup to our production server. Why download and reupload? We want to migrate from the 'S3 compatible' Digital Ocean Spaces to actual S3, and I don't think we want to clog the existing bucket with more files. It's also easier to test this way, and we can perform the batch rename with an existing tool instead of writing yet another script.I think someone from @svsticky/it-crowd should pick this up, as they have access to the production buckets.