publiclab / spectral-workbench

Web-based tools for collecting, analyzing, and sharing data from a DIY spectrometer
http://spectralworkbench.org
GNU General Public License v3.0
126 stars 157 forks source link

accept or modify non-latin characters #16

Open jywarren opened 9 years ago

jywarren commented 9 years ago

screenshot from 2014-12-19 13 53 28

These upload, but they don't save properly in the db.

btbonval commented 9 years ago

You using MySQL for SW?

Reminds me of this problem: https://github.com/jywarren/plots2/issues/102

jywarren commented 9 years ago

yeah - and hagit mentioned she wasn't able to make a wiki page title include hebrew characters also. In this case, I think we might be better served by just re-encoding the characters into something else, or even just renaming them to original.png thumb.png etc. But we could also plumb this in the db. Do you have a preference? I actually am overdue to restructure the directory and naming conventions since we ended up running past the limit of subdirectories on the new server.

https://github.com/publiclab/spectral-workbench/blob/master/app/models/spectrum.rb#L12

Paperclip has much better defaults; this namign convention was from an older attachment handling system and sorely needs migration to the Paperclip standard.

On Fri, Dec 19, 2014 at 2:37 PM, Bryan Bonvallet notifications@github.com wrote:

You using MySQL for SW?

Reminds me of this problem: jywarren/plots2#102 https://github.com/jywarren/plots2/issues/102

— Reply to this email directly or view it on GitHub https://github.com/publiclab/spectral-workbench/issues/16#issuecomment-67686206 .

btbonval commented 9 years ago

For hagit's problem, it's likely a matter of table encoding like the ticket I referenced.

For the file system problem ... you know I just realized that perhaps the file system itself is having issues with the i18n. I imagine Linux distros come out of the box supporting such things, but I never really tested it in any way.

Filenames are a human convention. It's fine to store the filename and other metadata in the database. You could save the files to disk using a checksum or a unique identifier generated in the database (e.g. primary key) instead of the human-generated filename. There's no reason to code anything important into the file's name or the file's location on the file system. The database can store all of that info :)

On Fri, Dec 19, 2014 at 2:47 PM, Jeffrey Warren notifications@github.com wrote:

yeah - and hagit mentioned she wasn't able to make a wiki page title include hebrew characters also. In this case, I think we might be better served by just re-encoding the characters into something else, or even just renaming them to original.png thumb.png etc. But we could also plumb this in the db. Do you have a preference? I actually am overdue to restructure the directory and naming conventions since we ended up running past the limit of subdirectories on the new server.

https://github.com/publiclab/spectral-workbench/blob/master/app/models/spectrum.rb#L12

Paperclip has much better defaults; this namign convention was from an older attachment handling system and sorely needs migration to the Paperclip standard.

On Fri, Dec 19, 2014 at 2:37 PM, Bryan Bonvallet notifications@github.com

wrote:

You using MySQL for SW?

Reminds me of this problem: jywarren/plots2#102 https://github.com/jywarren/plots2/issues/102

— Reply to this email directly or view it on GitHub < https://github.com/publiclab/spectral-workbench/issues/16#issuecomment-67686206>

.

— Reply to this email directly or view it on GitHub https://github.com/publiclab/spectral-workbench/issues/16#issuecomment-67687592 .

jywarren commented 8 years ago

Hmm, i tried a few things, and I'm still getting Title can contain only letters, numbers, and spaces. even though config.encoding = 'utf-8' in application.rb

Not sure what's causing the validation trip, but I guess perhaps it's the regex from the validation itself:

/\A[\w\ -\'\"]+\z/

This seems to say we need to include the entire cyrillic alphabet, but that's nuts and not scalable to other character sets. Maybe we need a less rigorous validation?