pldn / LDWizard

🧙 LDWizard: A generic framework for simplifying the creation of linked data. Supported by the PLDN community.
European Union Public License 1.2
13 stars 7 forks source link

Support non-Unicode encodings / Automatic recoding to UTF-8 #14

Closed ivozandhuis closed 11 months ago

ivozandhuis commented 4 years ago

200807-0952-200198551181.txt

Export from the demo-version of the 'Zijper Collectie Beheer Systeem' ZCBS, a cgi-bin perl application used by dozens of local Dutch historical societies.

wouterbeek commented 4 years ago

LD Wizard is not yet able to automatically detect or convert the encoding. Since this is not a UTF-8 file:

$ uchardet {from-file}
ISO-8859-2

it must first be recoded locally:

$ iconv -f ISO-8859-2 -t UTF-8 {file-file} > {to-file}

I'm not sure why modern libraries are unable to perform the above steps. The above tools have been available in POSIX environments for decades.

wouterbeek commented 2 years ago

I have changes the title of this issue. It is possible to automatically detect the source encoding. It is also possible to automatically recode to UTF-8. Most RDF formats must be UTF-8 encoded, so this would make the source data viable as a source for RDF.

philipperenzen commented 11 months ago

Feature has been implemented in branch: feature/14/automatic-recoding-to-UTF-8 (see commit: https://github.com/pldn/LDWizard/commit/5bdc7e6bde64139f34c796097ad49ebab5136d44). This will be added in the next patch version of LDWizard.

Please note the example file will throw an error in the transformation step, I believe this is related to the double quotes on line 3 in the file ("''" in "Tehuis voor Ouden van Dagen en Hulpbehoevenden 'Buitenzorg'"). This error is coming from the rocket RML library's CSV parser it seems:

Error: Invalid Closing Quote: got "." at line 3 instead of delimiter, record delimiter, trimable character (if activated) or comment
    CsvError sync.cjs:6
    parse sync.cjs:767
    parse sync.cjs:1326
    CsvParser CSVParser.js:9
    parseFile parser.js:50
    process index.js:88
    parseFileLive index.js:33
    _callee$ rocketrmlScript.ts:36
    tryCatch rocketrmlScript.ts:3
    makeInvokeMethod rocketrmlScript.ts:3
    defineIteratorMethods rocketrmlScript.ts:3
    asyncGeneratorStep rocketrmlScript.ts:3
    _next rocketrmlScript.ts:3
    promise callback*asyncGeneratorStep rocketrmlScript.ts:3
    _next rocketrmlScript.ts:3
    _asyncToGenerator rocketrmlScript.ts:3
    _asyncToGenerator rocketrmlScript.ts:3
    applyTransformation rocketrmlScript.ts:43
    _callee2$ index.tsx:223
    tryCatch index.tsx:2
    makeInvokeMethod index.tsx:2
    defineIteratorMethods index.tsx:2
    asyncGeneratorStep index.tsx:2
    _next index.tsx:2
    promise callback*asyncGeneratorStep index.tsx:2
    _next index.tsx:2
    _asyncToGenerator index.tsx:2
    _asyncToGenerator index.tsx:2
    transformFunction index.tsx:230
    Publish index.tsx:233
    React 13
localhost:4000:12966:25
    overrideMethod (index):12966
    ./src/config/rocketrmlScript.ts/applyTransformation/_callee/_callee$/< rocketrmlScript.ts:38
    (Async: promise callback)
    _callee$ rocketrmlScript.ts:36
    tryCatch rocketrmlScript.ts:3
    makeInvokeMethod rocketrmlScript.ts:3
    defineIteratorMethods rocketrmlScript.ts:3
    asyncGeneratorStep rocketrmlScript.ts:3
    _next rocketrmlScript.ts:3
    (Async: promise callback)
    asyncGeneratorStep rocketrmlScript.ts:3
    _next rocketrmlScript.ts:3
    _asyncToGenerator rocketrmlScript.ts:3
    _asyncToGenerator rocketrmlScript.ts:3
    applyTransformation rocketrmlScript.ts:43
    _callee2$ index.tsx:223
    tryCatch index.tsx:2
    makeInvokeMethod index.tsx:2
    defineIteratorMethods index.tsx:2
    asyncGeneratorStep index.tsx:2
    _next index.tsx:2
    (Async: promise callback)
    asyncGeneratorStep index.tsx:2
    _next index.tsx:2
    _asyncToGenerator index.tsx:2
    _asyncToGenerator index.tsx:2
    transformFunction index.tsx:230
    Publish index.tsx:233
    React 13