p2 / quicklook-csv

A QuickLook plugin for CSV files
Other
289 stars 22 forks source link

File extensions are ignored #22

Closed victorlin closed 5 years ago

victorlin commented 7 years ago

Take these two files as an example:

foo.tsv

a,b,c

foo.csv

a   b   c

It seems that the file extension is ignored and the format is automatically detected.

More generally (regardless of file extension), it seems that a file is detected as tab-delimited only when the amount of tab characters is greater than the amount of commas. Otherwise, it is detected as comma-delimited.

Also, the .tab file extension does not seem to trigger this plugin.

p2 commented 7 years ago
mattrjohnson commented 5 years ago

I appreciate the auto-detection although it's true that it doesn't work in all cases (in some of my files, it definitely mistakes TSVs as CSVs). My recommendation/request would be to go with the telegraphed filetype if it is explicitly in the filename and auto-detect otherwise -- that lets the user explicitly override the heuristics without having to do anything weirder than use the file extension they are probably already using. (The downside is that if a TSV is misnamed as a CSV, it won't preview correctly, but I view that as a feature -- it teaches people to name their files correctly.)

Here is some not-great but seemingly working code that does this (essentially a drop-in replacement for the first and last lines in the two places they occur):

            CSVDocument *csvDoc = [CSVDocument new];
            if ( [[myURL pathExtension] isEqualToString:@"tsv"] || [[myURL pathExtension] isEqualToString:@"TSV"] || [[myURL pathExtension] isEqualToString:@"tab"] || [[myURL pathExtension] isEqualToString:@"TAB"] ) {
                csvDoc.autoDetectSeparator = NO;
                csvDoc.separator = @"\t";
            } else if ( [[myURL pathExtension] isEqualToString:@"csv"] || [[myURL pathExtension] isEqualToString:@"CSV"] ) {
                csvDoc.autoDetectSeparator = NO;
                csvDoc.separator = @",";
            } else
                csvDoc.autoDetectSeparator = YES;

I'm happy to make this a bit cleaner and/or make it an official pull request if you actually want to implement this and want it made as easy on you as possible.

p2 commented 5 years ago

I’m not able to update this plugin at this time. But I couldn’t take your proposed fix since I did the autodetection because I often got “.csv” that we’re actually TSV and I couldn’t change the filename (for various reasons, mostly because they were data files for scripts that broke when changing the name).

You’re welcome to fork the repo and publish your changes though!

mattrjohnson commented 5 years ago

Fair enough, I may indeed fork at some later date when I get a chance (I'm a little backed up at present). I certainly understand your rationale -- I have also had systems give me "CSV" files before that are actually TSV (or worse yet, CSV/TSV files that are named ".xls" to force them to open in Excel).

I still think, from a purely theoretical UI perspective, that it would be good to have SOME way to override the autodetection if the autodetection cannot be guaranteed to work perfectly in all cases (the latter would be most preferable but also probably very hard to guarantee, especially with all the not-quite-standards-compliant files out there). But the autodetection does seem to work fine in most cases and I understand if making it perfect is not exactly your highest priority, since I can't claim it's mine either.

Anyway, thanks for the response and for making this in the first place! Really made the job I was doing much easier, even with my 20-minute detour to implement my fix and recompile.

p2 commented 5 years ago

Yeah indeed, finding a balance for this autodetect was very much influenced by the type of files I found. Glad it helped getting started!