mitodl / ocw-studio

Open Source Courseware authoring tool
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

allow unicode characters in filenames #2087

Closed gumaerc closed 7 months ago

gumaerc commented 8 months ago

What are the relevant tickets?

Closes https://github.com/mitodl/ocw-studio/issues/2084

Description (What does it do?)

This PR alters calls to django.utils.text.slugify and sets allow_unicode=True. The default is False, and since we don't explicitly set it to True, we end up with blank file names in scenarios such as syncing a gdrive file with a unicode filename or creating WebsiteContent with a unicode title.

Screenshots (if appropriate):

image

How can this be tested?

pdpinch commented 8 months ago

What slug/URL do we get when the title is set to テスト ページ ?

HussainTaj-arbisoft commented 8 months ago

Perhaps we should make only the filenames Unicode compatible and leave all other fields incompatible for now. Though, this distinction will be psychological because users can still enter unicode characters in some fields and we'll just strip them.

gumaerc commented 8 months ago

What slug/URL do we get when the title is set to テスト ページ ?

@pdpinch We would get a slug of テスト-ページ, which modern browsers do support but older operating systems / browsers may have issues with this. @HussainTaj-arbisoft has highlighted a number of areas where Unicode characters will be problematic, and I'm going to step back the scope of this PR to take that into account, although I didn't get a chance to address this today.

gumaerc commented 7 months ago

@HussainTaj-arbisoft This is ready for another look. I narrowed the scope so only files uploaded thru Google Drive and files created by creating WebsiteContent objects are affected. With Google Drive files, Django automatically URL encodes the file.url property, which is why you were seeing the encoded value in the frontend. I simply modified the Javascript to decode the value before displaying it, which should be fine since that's only for display, not editing. When the URL is injected into markdown, it will still be URL encoded. This will give us the greatest compatibility, and when the user downloads the file it should have the original utf-8 based filename if their browser supports decoding it.