Closed victorlin closed 5 years ago
.tab
is actually specified, but it's not being triggered because it seems macOS does not associate these files with public.tab-separated-values-text
. If you have a fix for that, much appreciated!I appreciate the auto-detection although it's true that it doesn't work in all cases (in some of my files, it definitely mistakes TSVs as CSVs). My recommendation/request would be to go with the telegraphed filetype if it is explicitly in the filename and auto-detect otherwise -- that lets the user explicitly override the heuristics without having to do anything weirder than use the file extension they are probably already using. (The downside is that if a TSV is misnamed as a CSV, it won't preview correctly, but I view that as a feature -- it teaches people to name their files correctly.)
Here is some not-great but seemingly working code that does this (essentially a drop-in replacement for the first and last lines in the two places they occur):
CSVDocument *csvDoc = [CSVDocument new];
if ( [[myURL pathExtension] isEqualToString:@"tsv"] || [[myURL pathExtension] isEqualToString:@"TSV"] || [[myURL pathExtension] isEqualToString:@"tab"] || [[myURL pathExtension] isEqualToString:@"TAB"] ) {
csvDoc.autoDetectSeparator = NO;
csvDoc.separator = @"\t";
} else if ( [[myURL pathExtension] isEqualToString:@"csv"] || [[myURL pathExtension] isEqualToString:@"CSV"] ) {
csvDoc.autoDetectSeparator = NO;
csvDoc.separator = @",";
} else
csvDoc.autoDetectSeparator = YES;
I'm happy to make this a bit cleaner and/or make it an official pull request if you actually want to implement this and want it made as easy on you as possible.
I’m not able to update this plugin at this time. But I couldn’t take your proposed fix since I did the autodetection because I often got “.csv” that we’re actually TSV and I couldn’t change the filename (for various reasons, mostly because they were data files for scripts that broke when changing the name).
You’re welcome to fork the repo and publish your changes though!
Fair enough, I may indeed fork at some later date when I get a chance (I'm a little backed up at present). I certainly understand your rationale -- I have also had systems give me "CSV" files before that are actually TSV (or worse yet, CSV/TSV files that are named ".xls" to force them to open in Excel).
I still think, from a purely theoretical UI perspective, that it would be good to have SOME way to override the autodetection if the autodetection cannot be guaranteed to work perfectly in all cases (the latter would be most preferable but also probably very hard to guarantee, especially with all the not-quite-standards-compliant files out there). But the autodetection does seem to work fine in most cases and I understand if making it perfect is not exactly your highest priority, since I can't claim it's mine either.
Anyway, thanks for the response and for making this in the first place! Really made the job I was doing much easier, even with my 20-minute detour to implement my fix and recompile.
Yeah indeed, finding a balance for this autodetect was very much influenced by the type of files I found. Glad it helped getting started!
Take these two files as an example:
foo.tsv
foo.csv
foo.tsv
yieldsComma-Separated, UTF-8
.foo.csv
yieldsTab-Separated, UTF-8
.It seems that the file extension is ignored and the format is automatically detected.
More generally (regardless of file extension), it seems that a file is detected as tab-delimited only when the amount of tab characters is greater than the amount of commas. Otherwise, it is detected as comma-delimited.
Also, the
.tab
file extension does not seem to trigger this plugin.