Closed mpogue2 closed 1 year ago
Apostrophe is a special character. Do you have an escape logic for that in the search like adding a backslash before it?
as it seems everybody uses their own kind of apostrophe. maybe because of the fact that I have another language set up here some look kind of strange (from the lyrics pool). in the three following file names another character than the standard apostrophe is used:
where others meet international standards:
Do we need to use libiconv to do an ASCII flattening of everything?
That feels kinda aggressive, on the other hand punting it to users to do the right thing for their locale also feels weird. Message ID: @.***>
can the apostrophe be replaced by a one character wild card in the search string?
I think that what I did at some point was break the search string into words, and search for each word appearing somewhere after the previous word. With special handling for tags. Which means that a solution might be to do a word break on any non-alphanumeric characters, and break between alpha and numbers.Message ID: <mpogue2/SquareDesk/issues/816/1497900431@ github.com>
I think the algorithm that you describe, @danlyke, is probably the root cause of the problem that I experienced.
My guess is that what @Gero5 is describing is something different. This could be the use of 7-bit ASCII encodings. It could be typos, too. Whether a wild card would work or not would need to be investigated.
To that end: could you copy/paste those strings into this ticket? That might or might not work (it works, if it looks the same as those screenshots you posted above, I think...). If not, we might need to make a special version of SquareDesk for you to test, that does the equivalent of "od -hc" on those filenames. OR, if you have command line, doing something like
ls -1 *So Easy* | od -hc
would be useful for figuring this out (and same for the other couple of examples you provided screenshots for). Thanks!
Oh one more thing, @danlyke. Encoding conversion would only be possible if the source encoding were known for certain. In my experience, that's essentially impossible to know when it's in a filename, OR when it's in a 7-bit ASCII plain text file, etc..
It's also a PITA for us to support all possible encodings, when the far simpler solution is "look through your filenames, and anything that's doesn't look right in SquareDesk, rename the file". The "Show in Finder" context menu item is a good way to get to the weird filename quickly.
@danlyke I am wondering whether we could just add "'" (single quote) to the regex's in splitIntoWords()
, allowing that single quote to be treated as an alphabetic character for splitting purposes.
This would not pick up all possible weird punctuation in Unicode, but I think it's an easy fix and it might solve 99% of the problem (and the other 1% of the cases could be fixed manually by the user by renaming to use single quote).
I've got a lot of these:
ESP 427 - Get 'Er Done.mp3
ESP 451 - 'Cuda _with sound effects.mp3
ESP 451 - 'Cuda.mp3
ESP 483 - Fiddlin' Around.mp3
POP - Benton's Hallucinations (Great Bear Trio).mp3
POP - Bump 'N' Hustle (Down To The Bone).mp3
POP - Cinnamon Girl (feat. Boban ' Marko Markovic Orkestra) [Instrumental] copy 4.m4a
POP - Dawg's Breath.mp3
POP - Surfdoggin'.mp3
POP - The Wasp's Goggles (Vannorstrand).mp3
POP - You're the Boss (Brian Setzer).mp3
RIV 1108 - Beethoven's 9th.mp3
RIV 1186 - Johnny O'Leary.mp3
RIV 284 - Bob's Cripple Creek (patter).mp3
RIV 313 - Bob's Cabbage.mp3
RIV 404 - Don't Stop (Patter).mp3
RIV 581 - Charlie's Patter.mp3
RIV 999 - T'smidje Mix 1.mp3
RR 1309 - Smokin'.mp3
RR 1339 - Mac's Hoedown.mp3
SS 1033 - Better When I'm Dancin'-Melody No BGV.mp3
ARROW 2131 - Don't Worry Baby (BGV).mp3
ARROW 2131 - Don't Worry Baby.mp3
BS 2621 - I'm Beginning To See The Light.mp3
BS 2621H - I'm Beginning To See The Light.mp3
EGO 110 - I've Got You Under My Skin (Instrumental Sample Higher Key).mp3
EGO 110 - I've Got You Under My Skin (Instrumental Sample).mp3
EGO 110 - I've Got You Under My Skin (no Leads Higher Key).mp3
EGO 110 - I've Got You Under My Skin (no Leads).mp3
EGO 110 - I've Got You Under My Skin (with Leads Higher Key).mp3
EGO 110 - I've Got You Under My Skin (with Leads).mp3
EGO 301 - Down In The Valley (BGV's Only).mp3
EGO 301 - Down In The Valley (Leads and BGV's).mp3
EGO 301 - Down In The Valley (No Leads or BGV's).mp3
ESP 1014 - I'll Be Home For Christmas.mp3
FEBS 208 - You Ain't Never Had A Friend Like Me.mp3
FT 158 - It's So Easy (Called by Shauna Kaaria).mp3
FT 158 - It's So Easy (Inst Melody WBGV).mp3
FT 158 - It's So Easy (Inst NOMelody WBGV).mp3
FT 158 - It's So Easy (Inst With Melody).mp3
FT 158 - It's So Easy (Inst With NO Melody).mp3
FT 158 - It's So Easy-Inst-Melody-WBGV.2.html
FT 207 - The Party's Over (Turn Out the Lights W Leads).mp3
HH 5131 - The Party's Over.mp3
MR 70 - Fisherman's Luck (instrumental).mp3
NCR 006 - That's Amore (Leads,Fills,BGV).mp3
NCR 006 - That's Amore (Leads,Fills,no BGV).mp3
NCR 006 - That's Amore (no Leads,BGV).mp3
NCR 006 - That's Amore (no Leads,no BGV).mp3
RI 932 - If You're Not In It For Love.mp3
RI 932H - If You're Not In It For Love.mp3
RI 963 - I'm Alive.mp3
RI 963h - I'm Alive.mp3
RR 113 - If We're Not Back In Love By Monday.mp3
RR 288B - Don't Think Twice.mp3
RR 369 - Wouldn't It Be Nice.mp3
RR 369H - Wouldn't It Be Nice.mp3
RR 412 - Nothin' From Nothin'.mp3
RS 722 - It's High Time.mp3
RWH 1039 - Rolling in My Sweet Baby's Arms (harmony).mp3
RWH 1039 - Rolling in My Sweet Baby's Arms (m).mp3
RYL 1001 - Beatle's Medley.mp3
SIR 704 - It's Raining Men (Instrumental).mp3
SS 1004 - I'm Gonna Be 500 Miles (BGV).mp3
SS 1004 - I'm Gonna Be 500 Miles.mp3
Seems totally reasonable to me! I'm at work and not thinking about that code right now, but it may be as easy as changing a whitespace match like @"\s" (I don't remember if we used the regex engine for that) with @"\W". Message ID: @.***>
@danlyke Ah...That function already uses "\W+", and that means not "[a-zA-Z0-9]".
So, I'm thinking maybe change "\W+" to "[^a-zA-Z0-9']+" or maybe even "[^a-zA-Z0-9_'`]+" to include
both single quote (most common quote in "it's") and single backquote (most common incorrectly-used quote).
If I get a chance, I'll give that a try, and see what happens! Thanks for the quick reply!
Yeah, it's possible that there's character set locale issues with whatever that \W is doing.. every time I get into the meta state of regular expressions, or Unicode stuff, my brain goes boom.
Fixed by 766f78cee8ad1833ca3387cb62a49c07b0d281aa .
Single quote was basically used as a delimiter for matching purposes. Now it's a normal matched character. We'll see if this is OK -- might want to make it an ignored character at some point (e.g. so that "it's" matches "its" and "it's"?
"Its So Easy" matches the song "FT 158 - It's So Easy", but "It's So Easy" does not. That's unexpected.