madec-project / ezvis

A dashboard to visualize a synthesis on a structured corpus, using several charts (pies, histograms, ...)
https://ezvis.readthedocs.org/
17 stars 5 forks source link

ezref/castor-core: make CSV and XML files usable #65

Closed parmentf closed 8 years ago

parmentf commented 9 years ago

Some users encountered a parsing problem when loading (using http) CSV files from ezref: parseCSV from JBJ did not work well. They had to use JSON files instead (which worked well). Maybe a trouble with the headers (not?) returned by ezref (hence http-server).

parmentf commented 8 years ago

After reproducing the trouble (a CSV with 2 columns), and a documentField:

    "$label2iso3": {
      "$?": "http://localhost:35000/Pays_Label_CodeISO3.csv"
    }

it seems that the first problem is that the string is not complete.

Located in castorjs/castor-core:loaders/document.js lies a truncating slice, limiting every indexed field to 1000 characters.

Setting a "no-index": true deactivates the mechanism for that field:

    "$label2iso3": {
      "noindex": true,
      "$?": "http://localhost:35000/Pays_Label_CodeISO3.csv"
    }
parmentf commented 8 years ago

The label2iso3 field contains the whole content of the CSV file: "label2iso3" : "\"Afghanistan\";\"AFG\"\n\"Aland Islands\";\"ALA\"\n\"Albania\";\"ALB\"\n\"Algeria\";\"DZA\"\n\"American Samoa\";\"ASM\"\n\"Andorra\";\"AND\"\n\"Angola\";\"AGO\"\n\"Anguilla\";\"AIA\"\n\"Antarctica\";\"ATA\"\n\"Antigua and Barbuda\";\"ATG\"\n\"Argentina\";\"ARG\"\n\"Armenia\";\"ARM\"\n\"Aruba\";\"ABW\"\n\"Australia\";\"AUS\"\n\"Austria\";\"AUT\"\n\"Azerbaijan\";\"AZE\"\n\"Bahamas\";\"BHS\"\n\"Bahrain\";\"BHR\"\n\"Bangladesh\";\"BGD\"\n\"Barbados\";\"BRB\"\n\"Belarus\";\"BLR\"\n\"Belgium\";\"BEL\"\n\"Belize\";\"BLZ\"\n\"Benin\";\"BEN\"\n\"Bermuda\";\"BMU\"\n\"Bhutan\";\"BTN\"\n\"Bolivia\";\"BOL\"\n\"Bosnia and Herzegovina\";\"BIH\"\n\"Botswana\";\"BWA\"\n\"Bouvet Island\";\"BVT\"\n\"Brazil\";\"BRA\"\n\"British Indian Ocean Territory\";\"IOT\"\n\"Brunei Darussalam\";\"BRN\"\n\"Bulgaria\";\"BGR\"\n\"Burkina Faso\";\"BFA\"\n\"Burma\";\"BUR\"\n\"Burundi\";\"BDI\"\n\"Cambodia\";\"KHM\"\n\"Cameroon\";\"CMR\"\n\"Canada\";\"CAN\"\n\"Canton and Enderbury, Islands\";\"CTE\"\n\"Cape Verde\";\"CPV\"\n\"Cayman Islands\";\"CYM\"\n\"Central African Republic\";\"CAF\"\n\"Chad\";\"TCD\"\n\"Chile\";\"CHL\"\n\"China\";\"CHN\"\n\"Christmas Island\";\"CXR\"\n\"Cocos (Keeling) Islands\";\"CCK\"\n\"Colombia\";\"COL\"\n\"Comoros\";\"COM\"\n\"Congo\";\"COG\"\n\"Congo,the Democratic Republic of\";\"COD\"\n\"Cook Islands\";\"COK\"\n\"Costa Rica\";\"CRI\"\n\"Cote d'Ivoire\";\"CIV\"\n\"Croatia\";\"HRV\"\n\"Cuba\";\"CUB\"\n\"Cyprus\";\"CYP\"\n\"Czech Republic\";\"CZE\"\n\"Denmark\";\"DNK\"\n\"Djibouti\";\"DJI\"\n\"Dominica\";\"DMA\"\n\"Dominican Republic\";\"DOM\"\n\"East Timor\";\"TMP\"\n\"Ecuador\";\"ECU\"\n\"Egypt\";\"EGY\"\n\"El Salvador\";\"SLV\"\n\"Equatorial Guinea\";\"GNQ\"\n\"Eritrea\";\"ERI\"\n\"Estonia\";\"EST\"\n\"Ethiopia\";\"ETH\"\n\"Europe\";\"EUR\"\n\"Falkland Islands (Malvinas)\";\"FLK\"\n\"Faroe Islands\";\"FRO\"\n\"Fiji\";\"FJI\"\n\"Finland\";\"FIN\"\n\"France\";\"FRA\"\n\"France Metropolitan\";\"FXX\"\n\"French Guiana\";\"GUF\"\n\"French Polynesia\";\"PYF\"\n\"French Southern Territories\";\"ATF\"\n\"Gabon\";\"GAB\"\n\"Gambia\";\"GMB\"\n\"Georgia\";\"GEO\"\n\"German Democratic Republic\";\"DDR\"\n\"Germany\";\"DEU\"\n\"Ghana\";\"GHA\"\n\"Gibraltar\";\"GIB\"\n\"Greece\";\"GRC\"\n\"Greenland\";\"GRL\"\n\"Grenada\";\"GRD\"\n\"Guadeloupe\";\"GLP\"\n\"Guam\";\"GUM\"\n\"Guatemala\";\"GTM\"\n\"Guinea\";\"GIN\"\n\"Guinea-Bissau\";\"GNB\"\n\"Guyana\";\"GUY\"\n\"Haiti\";\"HTI\"\n\"Heard Island and Mcdonald Islands\";\"HMD\"\n\"Holy See (Vatican City State)\";\"VAT\"\n\"Honduras\";\"HND\"\n\"Hong-Kong\";\"HKG\"\n\"Hungary\";\"HUN\"\n\"Iceland\";\"ISL\"\n\"India\";\"IND\"\n\"Indonesia\";\"IDN\"\n\"International\";\"INT\"\n\"Iran\";\"IRN\"\n\"Iran, Islamic Republic of\";\"IRN\"\n\"Iraq\";\"IRQ\"\n\"Ireland\";\"IRL\"\n\"Israel\";\"ISR\"\n\"Italy\";\"ITA\"\n\"Jamaica\";\"JAM\"\n\"Japan\";\"JPN\"\n\"Johnston Island\";\"JTN\"\n\"Jordan\";\"JOR\"\n\"Kazakhstan\";\"KAZ\"\n\"Kenya\";\"KEN\"\n\"Kiribati\";\"KIR\"\n\"Korea, Democratic People's Republic\";\"PRK\"\n\"Korea, Republic of\";\"KOR\"\n\"Kuwait\";\"KWT\"\n\"Kyrgyzstan\";\"KGZ\"\n\"Lao People's Democratic Republic\";\"LAO\"\n\"Latvia\";\"LVA\"\n\"Lebanon\";\"LBN\"\n\"Lesotho\";\"LSO\"\n\"Liberia\";\"LBR\"\n\"Libyan Arab Jamahiriya\";\"LBY\"\n\"Liechtenstein\";\"LIE\"\n\"Lithuania\";\"LTU\"\n\"Luxembourg\";\"LUX\"\n\"Macao\";\"MAC\"\n\"Macedonia,the Former Yugoslave Republic of\";\"MKD\"\n\"Madagascar\";\"MDG\"\n\"Malawi\";\"MWI\"\n\"Malaysia\";\"MYS\"\n\"Maldives\";\"MDV\"\n\"Mali\";\"MLI\"\n\"Malta\";\"MLT\"\n\"Marshall Islands\";\"MHL\"\n\"Martinique\";\"MTQ\"\n\"Mauritania\";\"MRT\"\n\"Mauritius\";\"MUS\"\n\"Mayotte\";\"MYT\"\n\"Mexico\";\"MEX\"\n\"Micronesia, Federated States of\";\"FSM\"\n\"Midway Islands\";\"MID\"\n\"Moldova, Republic of\";\"MDA\"\n\"Monaco\";\"MCO\"\n\"Mongolia\";\"MNG\"\n\"Montenegro\";\"MNE\"\n\"Montserrat\";\"MSR\"\n\"Morocco\";\"MAR\"\n\"Mozambique\";\"MOZ\"\n\"Myanmar\";\"MMR\"\n\"Namibia\";\"NAM\"\n\"Nauru\";\"NRU\"\n\"Nepal\";\"NPL\"\n\"Netherlands\";\"NLD\"\n\"Netherlands Antilles\";\"ANT\"\n\"Neutral Zone\";\"NTZ\"\n\"New Caledonia\";\"NCL\"\n\"New Hebrides\";\"NHB\"\n\"New Zealand\";\"NZL\"\n\"Nicaragua\";\"NIC\"\n\"Niger\";\"NER\"\n\"Nigeria\";\"NGA\"\n\"Niue\";\"NIU\"\n\"Norfolk Island\";\"NFK\"\n\"Northern Mariana Islands\";\"MNP\"\n\"Norway\";\"NOR\"\n\"Oman\";\"OMN\"\n\"Pacific Islands (Trust Territory)\";\"PCI\"\n\"Pakistan\";\"PAK\"\n\"Palau\";\"PLW\"\n\"Palestinian Territory, Occupied\";\"PSE\"\n\"Panama\";\"PAN\"\n\"Panama Canal Zone\";\"PCZ\"\n\"Papua New Guinea\";\"PNG\"\n\"Paraguay\";\"PRY\"\n\"Peru\";\"PER\"\n\"Philippines\";\"PHL\"\n\"Pitcairn\";\"PCN\"\n\"Poland\";\"POL\"\n\"Portugal\";\"PRT\"\n\"Puerto Rico\";\"PRI\"\n\"Qatar\";\"QAT\"\n\"Queen Maud Territory\";\"ATN\"\n\"Reunion\";\"REU\"\n\"Romania\";\"ROU\"\n\"Russian Federation\";\"RUS\"\n\"Rwanda\";\"RWA\"\n\"Saint Helena\";\"SHN\"\n\"Saint Kitts and Nevis\";\"KNA\"\n\"Saint Lucia\";\"LCA\"\n\"Saint Pierre and Miquelon\";\"SPM\"\n\"Saint Vincent and the Grenadines\";\"VCT\"\n\"Samoa\";\"WSM\"\n\"San Marino\";\"SMR\"\n\"Sao Tome and Principe\";\"STP\"\n\"Saudi Arabia\";\"SAU\"\n\"Senegal\";\"SEN\"\n\"Serbia\";\"SRB\"\n\"Serbia and Montenegro\";\"SCG\"\n\"Seychelles\";\"SYC\"\n\"Sierra Leone\";\"SLE\"\n\"Singapore\";\"SGP\"\n\"Slovakia\";\"SVK\"\n\"Slovenia\";\"SVN\"\n\"Solomon Islands\";\"SLB\"\n\"Somalia\";\"SOM\"\n\"South Africa\";\"ZAF\"\n\"South Georgia andthe South Sandwich Islands\";\"SGS\"\n\"South Korea\";\"KOR\"\n\"Southern Rhodesia\";\"RHO\"\n\"Spain\";\"ESP\"\n\"Sri Lanka\";\"LKA\"\n\"Sudan\";\"SDN\"\n\"Suriname\";\"SUR\"\n\"Svalbard and Jan Mayen\";\"SJM\"\n\"Swaziland\";\"SWZ\"\n\"Sweden\";\"SWE\"\n\"Switzerland\";\"CHE\"\n\"Syrian Arab Republic\";\"SYR\"\n\"Tajikistan\";\"TJK\"\n\"Tanzania, United Republic of\";\"TZA\"\n\"Tawain, Province of China\";\"TWN\"\n\"Thailand\";\"THA\"\n\"Timor Leste\";\"TLS\"\n\"Togo\";\"TGO\"\n\"Tokelau\";\"TKL\"\n\"Tonga\";\"TON\"\n\"Trinidad and Tobago\";\"TTO\"\n\"Tunisia\";\"TUN\"\n\"Turkey\";\"TUR\"\n\"Turkmenistan\";\"TKM\"\n\"Turks and Caicos Islands\";\"TCA\"\n\"Tuvalu\";\"TUV\"\n\"Uganda\";\"UGA\"\n\"UK\";\"GBR\"\n\"Ukraine\";\"UKR\"\n\"United Arab Emirates\";\"ARE\"\n\"United Kingdom\";\"GBR\"\n\"United States\";\"USA\"\n\"United States Miscellaneous Pacific Islands\";\"PUS\"\n\"Unites States Minor Outlying Islands\";\"UMI\"\n\"Unknown\";\"INC\"\n\"Upper Volta\";\"HVO\"\n\"Uruguay\";\"URY\"\n\"USSR\";\"SUN\"\n\"Uzbekistan\";\"UZB\"\n\"Vanuatu\";\"VUT\"\n\"Venezuela\";\"VEN\"\n\"Viet Nam\";\"VNM\"\n\"Viet-Nam, Democratic Republic of\";\"VDR\"\n\"Virgin Islands, British\";\"VGB\"\n\"Virgin Islands, U.S.\";\"VIR\"\n\"Wallis and Futuna\";\"WLF\"\n\"Western Sahara\";\"ESH\"\n\"Yemen\";\"YEM\"\n\"Yemen, Democratic\";\"YMD\"\n\"Yugoslavia\";\"YUG\"\n\"Zaire\";\"ZAR\"\n\"Zambia\";\"ZMB\"\n\"Zimbabwe\";\"ZWE\""

But when one adds a "parseCSV": ";" to the documentField, it gives only:

    [
        "Afghanistan",
        "AFG"
    ]
parmentf commented 8 years ago

Although

var CSV = require('csv-string');
CSV.parse("\"Afghanistan\";\"AFG\"\n\"Aland Islands\";\"ALA\"", ";")

yields

[ [ 'Afghanistan', 'AFG' ],
  [ 'Aland Islands', 'ALA' ] ]

JBJ, through

JBJ.filters.parseCSV("\"Afghanistan\";\"AFG\"\n\"Aland Islands\";\"ALA\"", ";")

gives

[ 'Afghanistan', 'AFG' ]

That is likely due to the shift() in JBJ:

return CSV.parse(obj, arg).shift();
parmentf commented 8 years ago

Using parseCSVFile, added in JBJ 3.12 produces the expected result:

[ [ "Afghanistan", "AFG" ],
  [ "Aland Islands", "ALA" ] ]

But now, we need

[ { "_id": "Afghanistan", "value": "AFG" },
  { "_id": "Aland Islands", "value": "ALA"}]

so that array2object could produce:

{ "Afghanistan": "AFG" ,
  "Aland Islands": "ALA" }

Let's say, the syntax will be "arrays2objects": ["_id", "value"]

parmentf commented 8 years ago

Here is the type of field one can use (once you installed JBJ 3.13):

    "$WorldCR": {
      "$?": "http://localhost:35000/ESI_AllFields_20150407.tsv",
      "parseCSVFile": "\t",
      "arrays2objects": true,
      "array2object": true
    }