Addition of non-ASCII string elements to DataSet results in an element with "?" instead of letters

GoogleCodeExporter commented 8 years ago

A great library with clear code! Thank you!

What steps will reproduce the problem?
1. DcmDataset data = new DcmDataset(DcmTS.ImplicitVRLittleEndian);
2. String patientsName = 
"Фамилия-Пациента^Имя-Пациента^Отчество-Пац
иента";
3. data.AddElementWithValueString(DcmTags.PatientsName, patientsName);
OR
   data.AddElementWithValue(DcmTags.PatientsName, patientsName);
3. patientsName = data.GetElement(DcmTags.PatientsName).GetValueString();

Expected: 
  patientsName == "Фамилия-Пациента^Имя-Пациента^Отчество-Пациента"
Observed:
  patientsName == "???????-????????^???-????????^????????-????????"

I use mDCM loaded from SVN on 2009, July 2

As far as I inspected, the problem is in ByteBuffer, which posesses the 
default ASCII encoding. It is possible to create the desired element first 
with an empty value, then set its encoding, afterwards assign it the 
desired value. I would propose another solution below, which will work for 
SpecificCharacterSet with VM of 1 only.
1) DcmDataset is added a method SetSpecificCharacterSet(String), which 
stores internally the desired character set and adds the SpecificCharacterSet 
attribute to the dataset.
2) In the methods of DcmDataset that manage assignment of string values, 
namely AddElementWithValueString, SetString, and SetStringArray encoding of 
the added/changed tag is set to be the encoding of the data set (taken from 
the internal variable)

I would make the change if you wish to. Just contact me at kkarmakul at 
gmail dot com.

Sorry for the mess if the issue is deeper and the solution works for small 
set of cases.

Thank you!
  Kirill

Original issue reported on code.google.com by kkarma...@gmail.com on 3 Jul 2009 at 9:12

GoogleCodeExporter commented 8 years ago

I find your suggestion interesting, I'd prefer that over my current code that 
uses
the workaround;

DcmPersonName referringPhysicianName = new
DcmPersonName(DcmTags.ReferringPhysiciansName);
if (Encoding != null)
  referringPhysicianName.ByteBuffer.Encoding = Encoding;

referringPhysicianName.SetValue(foo);
dataset.AddItem(referringPhysicianName);

Cheers,
  Lennart Kolmodin

Original comment by lennart....@vgregion.se on 21 Sep 2009 at 9:15

GoogleCodeExporter commented 8 years ago

SVN 64 adds the DcmDataset.SpecificCharacterSetEncoding property which will 
change
all encoded string elements in the dataset and properly set the encoding of new 
elements.

I'll leave this issue open until it is verified that this is working properly 
and the
solution is acceptable for everyone.

Original comment by colby.di...@gmail.com on 25 Sep 2009 at 5:10

Changed state: Started

GoogleCodeExporter commented 8 years ago

I've recently used DcmDataset.SpecificCharacterSetEncoding property for files 
with 
Russian text in tags. It works fine both in ISO 8859-5 and Unicode in UTF-8. 
Thank you!

Original comment by kkarma...@gmail.com on 3 Nov 2009 at 10:29

rafaeldimauro / mdcm

Addition of non-ASCII string elements to DataSet results in an element with "?" instead of letters #6