daikon chinese garbled - Githubissues

hu3402379 commented 6 years ago

hi, @martinezmj ,I have some dicom files, some file's patient name is chinese, and daikon.Image.prototype.getPatientName will return garbled. I do not know how to solve this issue.

The attch file is garbled dcm file for test. Thanks for your help..

nickhingston commented 6 years ago

hi @hu3402379,

Ive just had a look at this issue...

seems Daikon does not support anything other than ascii at the moment.

I have played around with utf-8, big5, gbw and gb18030

none seem to be perfect, but much improved in different ways!

seems others have the same issue: https://github.com/cornerstonejs/dicomParser/issues/89

i have created a branch on my fork https://github.com/nickhingston/Daikon.git#20-chinese-garbled

worth having a play to see which works best, I cant find out how to auto detect.

one of your files seem to have a different encoding for patientName and address! but as I cant read Chinese i've no idea what is correct!

to change the encoding set daikon.Utils.utfLabel = "encoding";

the encoding list is here: https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder/TextDecoder

cheers

nick

hu3402379 commented 6 years ago

@nickhingston ,aha,thanks for your reply.

This question is also my proposal:cornerstonejs/dicomParser#89. and 'iconv-lite' can solve this issue.

I have tryed to add this lib in daikon, but I don't know why, I downloaded daikon, and the compilation is not successful. Can you help me to import this library 'iconv-lite' in daikon and then convert the patient name. Then provide me with the compiled daikon.js.

thanks a lot.

nickhingston commented 6 years ago

ah, ok! Seems my unzip or OSX can't cope with Chinese filenames either!

I have pushed an updated version using iconv-lite to that same branch.

It has exactly the same problems as using TextDecoder

using 'gbk' or 'gb18030' i get (0010,0010) Patient Name REMOVED (0010,1040) Patient Address REMOVED

& (0010,0010) Patient Name REMOVED

which looks close-ish!

using utf-8: (0010,1040) Patient Address REMOVED But, patient names are very garbled in both files!

What were the settings you used to get iconv to work for everything?! the settings i have pushed to my branch is 'gb18030'

I guess the other question is, what build errors are you getting from npm run build?

nickhingston commented 6 years ago

actually please tell me our postings are OK from a privacy perspective?!

maybe we need to talk directly about this and files/names/address removed...

hu3402379 commented 6 years ago

@nickhingston
1. What were the settings you used to get iconv to work for everything?! 'gbk' or 'gb18030' is correct.

(0010,0010) Patient Name [**=�$)A**�-**] (0010,0010) Patient Name [**(**)]

'utf-8' is correct

(0010,1040) Patient Address [**105-101]

'gbk' or 'gb18030' is wrong.

(0010,1040) Patient Address [105-101]

I don't know how to use iconv work for every tag yet... It's very strange, each tag is a different encoding.

2. npm run build error:

hu3402379 commented 6 years ago

@nickhingston Thanks for your advice. This is not good from a privacy perspective, now, we can delete this information. I have edit commit replace key info with ****

nickhingston commented 6 years ago

yeah it doesn't make sense to me! editing to remove personal info... i guess you should remove the files from cornerstone's parser too!!

nickhingston commented 6 years ago

i cant edit your posts...please remove personal info from there.

If you want to contact me directly to continue any conversation that may involve personal info...use my email, i've just made it public on my profile...

nickhingston commented 6 years ago

did you do npm install first? can you provide the debug.log file the error message references?

what version of node are you using?

hu3402379 commented 6 years ago

@nickhingston I have delete the personal info. I have run 'npm install' before 'npm run build' node verison: v8.9.3 npm version: v6.1.0

2018-07-12T09_02_12_391Z-debug.log

nickhingston commented 6 years ago

there is still the patient address (copied from mine) in one of your posts:

'gbk' or 'gb18030' is wrong.

(0010,1040) Patient Address

yes but it references a debug.log file...

hu3402379 commented 6 years ago

@nickhingston I have submit debug.log file in last comment. You can help me find the reason of the error. thanks

nickhingston commented 6 years ago

yeah it looks like you havn't run npm install

was there any error from that?

hu3402379 commented 6 years ago

@nickhingston Actually, I did it, and everything looked correct.

nickhingston commented 6 years ago

Ah, think I see the problem! Looks like rm failed (load of Chinese after it? )

you need to run this in a unix like shell. As you are using windows - Mingw or something....

Sent from my iPhone

On 13 Jul 2018, at 03:30, hu3402379 notifications@github.com wrote:

@nickhingston Actually, I did it, and everything looked correct.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

hu3402379 commented 6 years ago

@nickhingston yeah. My environment is windows. I will try it . thanks .

hu3402379 commented 6 years ago

@nickhingston I have run build in babun. but still get error. and I have install browserify global.

nickhingston commented 6 years ago

hmm, seems rm thinks that option is for it, it isn't!

the command is multiline rm -rf build; mkdir build; browserify --standalone daikon src/main.js -o build/daikon.js; uglifyjs build/daikon.js -o build/daikon-min.js

the option should be for browserify...so something odd with your shell still. So either use a 'proper' OS, or try running these commands separately!

rm -rf build
mkdir build
browserify --standalone daikon src/main.js -o build/daikon.js
uglifyjs build/daikon.js -o build/daikon-min.js

hu3402379 commented 6 years ago

@nickhingston I compile commands separately successful, then i replace daikon.js in rii-mango/papaya and run './papaya-builder.sh' faild. if I dont replace, it will be success.

rm -rf build mkdir build browserify --standalone daikon src/main.js -o build/daikon.js uglifyjs build/daikon.js -o build/daikon-min.js

hu3402379 commented 6 years ago

@nickhingston I have solve it. Maybe it is Papaya's build ruler error. 'byte' is a reserved word. in 'daikon.js' source code:

// Checks the type of a UTF-8 byte, whether it's ASCII, a leading byte, or a // continuation byte. If an invalid byte is detected, -2 is returned. function utf8CheckByte(byte) { if (byte <= 0x7F) return 0;else if (byte >> 5 === 0x06) return 2;else if (byte >> 4 === 0x0E) return 3;else if (byte >> 3 === 0x1E) return 4; return byte >> 6 === 0x02 ? -1 : -2; }

Modified code code:

// Checks the type of a UTF-8 byte, whether it's ASCII, a leading byte, or a // continuation byte. If an invalid byte is detected, -2 is returned. function utf8CheckByte(val) { if (val <= 0x7F) { return 0; } else if (val >> 5 === 0x06) { return 2; } else if (val >> 4 === 0x0E) { return 3; } else if (val >> 3 === 0x1E) { return 4; } else { return val >> 6 === 0x02 ? -1 : -2; } }

in addition， 'dicom-parser' contributor tell me the good news, he has released an npm package ‘dicom-character-set’ to convert all encodings. If you want to, you can try。

@hu3402379 Good news! I just released an npm package to convert bytes from a DICOM text attribute into a JS string. It'll handle all of the encodings, including with extensions. Check it out here.

nickhingston commented 6 years ago

ok, interesting - that makes it a Papaya bug then - Daikon does not have this method!

I suppose the question should be - where should the conversion take place?! My preference would be in the parser itself. I which case this Issue still stands...

@martinezmj what do you think?

will look into dicom-character-set - will certainly be useful for my project (which does not use Papaya).

I am also considering forking off Daikon more 'officially' and calling it something like DICOMgpu.js...because i can't seem to get any response to my PRs on this project, and I think people would like a GPU accelerated javascript DICOM decoder!

nickhingston commented 6 years ago

@hu3402379 looked into dicom-character-set.

iconv seems to do a better job for your files than this...

have you any luck with it in the other lib you are using?

FYI, i'm using it like this: .convertBytes('ISO 2022 IR 100\ISO 2022 IR 58', new Uint8Array(strBuff), {vr: 'LT'});

iconv sorts out both the name and address i think, but I get "XU ER ZHU=$)A" & "-A=XU ER ZHU" around the name. Is this supposed to be there?!

hu3402379 commented 6 years ago

@nickhingston

Regardless of whether this result is converted from iconv or dicom-character-set, I think the patient name itself should be garbled.

So, the dicom file patient name is not correct coding.

I think both iconv of dicom-character-set is nice to solve chinese garbled.

nickhingston commented 6 years ago

OK thanks though I think its the other way around i think the address is not in the specified ('ISO 2022 IR 100\ISO 2022 IR 58) encoding

dicom-character set, doesn't work - iconv does (at least to some degree - it’s a bit ‘cleverer’ at working out unknowns).

I’ll create a PR on using the iconv solution. Thanks for your input on this.

On 30 Jul 2018, at 10:13, hu3402379 notifications@github.com wrote:

@nickhingston https://github.com/nickhingston Regardless of whether this result is converted from iconv or dicom-character-set, I think the patient name itself should be garbled.

So, the dicom file patient name is not correct coding.

I think both iconv of dicom-character-set is nice to solve chinese garbled.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rii-mango/Daikon/issues/20#issuecomment-408799184, or mute the thread https://github.com/notifications/unsubscribe-auth/AA059WnNxhveKs66Uf4zTiGhiitycB4dks5uLs47gaJpZM4VE8DR.

hu3402379 commented 6 years ago

@nickhingston Do you know about papaya's mpr function, I want to ask you some questions later. Do you mind leaving your email? We communicate via email.

Thanks a lot

nickhingston commented 6 years ago

i'm afraid not. I don't use papaya.

Feel free to email directly - details should be on my profile...

On 30 Jul 2018, at 12:38, hu3402379 notifications@github.com wrote:

@nickhingston https://github.com/nickhingston Do you know about papaya's mpr function, I want to ask you some questions later. Do you mind leaving your email? We communicate via email.

Thanks a lot

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rii-mango/Daikon/issues/20#issuecomment-408834312, or mute the thread https://github.com/notifications/unsubscribe-auth/AA059Yc8B3hSUHc3a8ZH6MARUnQTZVsFks5uLvAagaJpZM4VE8DR.

rii-mango commented 6 years ago

I agree this should be solved in Daikon parser rather than Papaya. Did the proposed solution work? Was there a PR that came out of this? If not, I'll take a look at it next. Sorry for my late reply.

nickhingston commented 6 years ago

No not a specific solution, iconv works if you hardcode the format, but there is no obvious way of converting from the DICOM spec to iconv format name. The other library didn't work for me at all…

On 13 Aug 2018, at 01:17, RII-Mango notifications@github.com wrote:

I agree this should be solved in Daikon parser rather than Papaya. Did the proposed solution work? Was there a PR that came out of this? If not, I'll take a look at it next. Sorry for my late reply.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rii-mango/Daikon/issues/20#issuecomment-412381800, or mute the thread https://github.com/notifications/unsubscribe-auth/AA059SQPFdzMOn6hz9Z_1qRgXKIO0QYAks5uQMWwgaJpZM4VE8DR.

miph1309 commented 5 years ago

@rii-mango do you have any solution for that? i have similar trouble with korean, and patient name return have added some special symbols

rii-mango / Daikon

daikon chinese garbled #20