Unicode not rendering properly

rexnfx commented 5 months ago

rendered snapshot Describe the bug

The image rendered from the DXF is not rendering the unicode characters as the symbol, instead readers the '\U+00##' See attacheed imaged.

rendered.png is the ezdxf result. snapshot.png is how the same file is displayed in an autodesk app.

To Reproduce Information and data needed to reproduce the error:

def default_config():
    #configuration used to create grayscale output from DXF files
    return config.Configuration(
        background_policy=config.BackgroundPolicy.WHITE,
        color_policy=config.ColorPolicy.BLACK,
        min_lineweight = 2.0
    )

def default_page():
    #create minimal margins for DXF output
    return layout.Page(700, 400, layout.Units.mm, margins=layout.Margins.all(2))

def output_svg(input_path, output_path, msp, context):
    # 2. create the backend
    backend = svg.SVGBackend()
    # create a new configuration for a white background and and a black foreground color
    # 3. create the frontend
    config = default_config()
    frontend = Frontend(context, backend, config=config)
    frontend.draw_layout(msp)

    # 5. create an A4 page layout, not required for all backends
    page = default_page()

    # 6. get the SVG rendering as string - this step is backend dependent
    svg_string = backend.get_string(page)

    svg_string = fix_svg(svg_string)

    with open(output_path, "wt", encoding="utf8") as fp:
        fp.write(svg_string)

def convert(input_path, output_path, output_format, logger):
    #convert dxf to png (commented out the code to convert SVG temporarily)
    try:
        doc = ezdxf.readfile(input_path)
    except Exception as e:
        print('file was not a DXF file: ', input_path, output_path, e)
        logger.error('DXF conversion failed for file: %s ERROR=( %s )', input_path, e)
        return

    #for ent in doc.entities:
        #if 'DIM' in str(ent):
            #print(str(ent),dir(ent))

    msp = doc.modelspace()

    context = RenderContext(doc)
    # 1. create the render context
    try:
        if output_format == "svg":
            output_svg(input_path, output_path, msp, context)
    except Exception as e:
        print('DXF conversion failed for file: ', input_path, output_path, e)
        logger.error('DXF conversion failed for file: %s ERROR=( %s )', input_path, e)

Information about the ezdxf version and the OS. OS: Windows 10 ezdxf version: 1.1.3
If processing a loaded DXF file causes the error, add a simplified and zipped DXF file which still triggers the error.

Confidential Data The attached files have been cleared for use discussing this issue in a public forum.

Expected behavior The output iamge should render the unicode symbol and not the chacter code.

Screenshots attached

mozman commented 5 months ago

This is the result for the current version of ezdxf v1.3.0b1 and the string \U+00F86.3

text.zip

rexnfx commented 5 months ago

I force updated to 1.3.0b1 and got

Is there a possibility there an issue related to available fonts on my device? can you think of a way for me to test this?

Thanks

rexnfx commented 5 months ago

I used this to look at what my document was using for font

    for ent in doc.entities:
        if 'TEXT' in str(ent):
            print(str(ent))
            #print(dir(ent))
            print(ent.font_name())

printed: 'arial.ttf'

and the ent.plain_text() printed '\U+00F86.3'

mozman commented 5 months ago

The special unicode encoding \U+xxxx (the regular way to encode unicode is \Uxxxx) and MIF encoding \M+xxxx is only supported by the ezdxf.recover module:

from ezdxf import recover

doc, auditor = recover.readfile("text.dxf")
msp = doc.modelspace()
txt = msp[0]
print(txt.dxf.text)

outputs:

ø6.3

rexnfx commented 5 months ago

I was able to get the expected result with the recover module.

However when i changed the unicode encoding format from special to regular and didn't use the recover module for this modified text.dxf example:

  0
TEXT
  5
9C
330
1F
100
AcDbEntity
  8
0
100
AcDbText
 10
123.759934511193
 20
185.192022461077
 30
0.0
 40
108.564411027157
  1
\U00F86.3
100
AcDbText
  0
ENDSEC
  0
SECTION
  2
OBJECTS

text.zip

I'm not sure what the drawbacks to the recover module are. I haven't had a chance to read the docs on that yet. If its nothing I will use it and move on, but if there is a drawback i would convert our DXF files to use the regular formatting and use the original readfile() method if i can get that to output correctly.

rexnfx commented 5 months ago

Also closing this as I can at least use this as a work around.

mozman commented 5 months ago

The non-regular encoding is required by AutoCAD you can't use the regular encoding in DXF files.

rexnfx commented 5 months ago

I'm seeing in the docs that recover has performance penalty.

auditor = doc.audit()

If I do this after a doc = ezdxf.readfile(), is there a way that the auditor will flag unicode decoding issues? I see in the recover.py that 'strict' argument that will throw a UnicodeDecodeError. I'm not seeing anything like that in audit that would allow me to selectively use recover, or I mistaken?

I didn't notice huge performance penalty on a single file conversion, but maybe more apparent on bulk conversions (which my project does)

mozman commented 5 months ago

Here are the docs for the recover module: https://ezdxf.mozman.at/docs/drawing/recover.html

tldr: If you don't know you get DXF files from reliable sources (AutoCAD, BricsCAD, ZWCAD, GstarCAD) tools use the recover module.

rexnfx commented 5 months ago

Thanks

mozman / ezdxf

Unicode not rendering properly #1063