Closed soodoku closed 6 years ago
Regarding to the original output file format :-
The structure of the file is
"URL","Category 1","Category 2",..........
Multiple categories spread to separated columns. In this case would you like to combined to one column?
Aah --- got it!
The surprising thing is that some strings with multiple categories are protected by a quote, some not. Maybe there is a logic to it.
But it may make sense to quote everytime we have multiple categories rather than put them into separate columns.
I agree with you to have put multiple categories to a column. However seems the category label may contains comma in it. So should we use "|" (pipe) as categories separator instead?
Interesting. Didn't realize that a single category label can also have a comma. If so, yeah, lets go with some fancy delimiter. Pipe/semi-colon --- whatever doesn't exist in the data. Thanks!
Protect strings with commas with quotes
Some domains have multiple category labels. Each label is separated by a comma. But these strings with commas are not always enclosed in quotes. For instance,
www.tanvald.cz,Top/World/Deutsch/Regional/Europa/Tschechien/Regionen/Reichenberg/Tanvald,Top/World/Česky/Státy_a_regiony/Evropa/Česká_republika/Kraje/Liberecký/Tanvald www.wstyler.com,Top/Business/Mining_and_Drilling/Tools_and_Equipment/Mining,Top/Regional/North_America/United_States/Ohio/Localities/M/Mentor/Business_and_Economy/Manufacturing,Top/Business/Industrial_Goods_and_Services/Cable_and_Wire/Wire_Mesh
Suggested fix: always protect category labels with quotes