tennom / saxonpy

This's a Python package for the Saxon/C 1.2.1, an XML processor from Saxonica. Currently, it packs the open-source version or the home edition.
16 stars 1 forks source link

Can only create PySaxonProcessor once per process #5

Open elshimone opened 1 year ago

elshimone commented 1 year ago

May be worth clarifying the usage of this class, as currently after releasing the processor you cannot recreate it in the same process. A contrived example:

from  saxonpy  import PySaxonProcessor

with PySaxonProcessor(license=False) as  proc:
    xsltproc = proc.new_xslt_processor()

    document = proc.parse_xml(xml_text="<out><person>text1</person><person>text2</person><person>text3</person></out>")

    xsltproc.set_source(xdm_node=document)
    xsltproc.compile_stylesheet(stylesheet_text="<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='2.0'> <xsl:param name='values' select='(2,3,4)' /><xsl:output method='xml' indent='yes' /><xsl:template match='*'><output><xsl:value-of select='//person[1]'/><xsl:for-each select='$values' ><out><xsl:value-of select='. * 3'/></out></xsl:for-each></output></xsl:template></xsl:stylesheet>")

    output2 = xsltproc.transform_to_string()
    print(output2)

with PySaxonProcessor(license=False) as  proc:
    xsltproc = proc.new_xslt_processor()

    document = proc.parse_xml(xml_text="<out><person>text1</person><person>text2</person><person>text3</person></out>")

    xsltproc.set_source(xdm_node=document)
    xsltproc.compile_stylesheet(stylesheet_text="<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='2.0'> <xsl:param name='values' select='(2,3,4)' /><xsl:output method='xml' indent='yes' /><xsl:template match='*'><output><xsl:value-of select='//person[1]'/><xsl:for-each select='$values' ><out><xsl:value-of select='. * 3'/></out></xsl:for-each></output></xsl:template></xsl:stylesheet>")

    output2 = xsltproc.transform_to_string()
    print(output2)

Running this results in the following output:

(venv) (base) simon@tachikoma:~/dev/saxon_test$ python xslt.py 
<?xml version="1.0" encoding="UTF-8"?>
<output>text1<out>6</out>
   <out>9</out>
   <out>12</out>
</output>

JNI_CreateJavaVM() failed with result: -5

This appears to be a limitation of the of the JNI API, see https://stackoverflow.com/a/66936249

ond1 commented 1 year ago

I recommend not to use the processor as a context if you are using Saxon in a server setting where you are keeping alive the app. This is because the release() function will be called preventing it to be used again in that process.

See note here: https://saxonica.plan.io/issues/4942#note-33

In SaxonC 11.4 we have made a number of improvements to handle the problem you have reported.

elshimone commented 1 year ago

Hi @ond1 thanks for the note. Are there plans for saxonica to release a python wheel package for saxon themselves?

ond1 commented 1 year ago

Hi @elshimone Yes we have plans to do so. We will be releasing an official python wheel packages for SaxonC 12 (Linux, MacOs and Windows)in the near future. We have successfully gone through a phase of testing of the wheels for the next release. Also in SaxonC 12 we have replaced the support of Excelsior Jet JVM with Graalvm native-image.

We have moved away from the use of JNI therefore you will not see the failure that you reported above (JNI_CreateJavaVM() failed with result).

elshimone commented 1 year ago

That's great news - if you need any beta testers let me know.

Simon

On Mon, 14 Nov 2022, at 9:29 AM, ond1 wrote:

Hi @elshimone https://github.com/elshimone Yes we have plans to do so. We will be releasing an official python wheel packages for SaxonC 12 (Linux, MacOs and Windows)in the near future. We have successfully gone through a phase of testing of the wheels for the next release. Also in SaxonC 12 we have replaced the support of Excelsior Jet JVM with Graalvm native-image.

We have moved away from the use of JNI therefore you will not see the failure that you reported above (JNI_CreateJavaVM() failed with result).

— Reply to this email directly, view it on GitHub https://github.com/tennom/saxonpy/issues/5#issuecomment-1313367957, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFKOX3WDQEPP3EV5DWHZKDWIIBBNANCNFSM6AAAAAAR52BBSY. You are receiving this because you were mentioned.Message ID: @.***>

ond1 commented 1 year ago

Thanks for your offer. I will contact you soon

tennom commented 1 year ago

@elshimone Thanks for reporting the problem. In our similar use case, we collected multiple input data and ran them all on the same process instead of creating separate processes on each input data.

tennom commented 1 year ago

@ond1 It's excellent news that you guys are going through tests for releasing the wheels. Are you guys going to have different wheels for the open-source version and enterprise versions? Or you will have different access levels on the same wheel? Some lower-level configurations like JVM environment settings via Python would be great as well.

ond1 commented 1 year ago

We have different wheels for the open-source and enterprise versions

gouripv commented 1 year ago

@ond1 Great to hear you are working on releasing the wheel packages. we are utilizing PySaxonProcessor in aws lambda and we are facing the almost same problem that is posted here: https://saxonica.plan.io/issues/4942. During the load test found we are keep getting Error: No stylesheet found. Please compile stylsheet before calling transformToString or check exceptions and then finally JET RUNTIME HAS DETECTED UNRECOVERABLE ERROR: runtime error. Wondering if there is anything i can do interim, also do you by chance have the timelines when official wheel package will be ready?

ond1 commented 1 year ago

Hi @gouripv is it possible for you to upgrade to SaxonC 11.4? I know with that release it is not built in a wheel, but you can build and install the Python extension.

We will try to push a beta release, but I would have to get back to you on when that will happen.

gouripv commented 1 year ago

Hi @ond1 Thank you for the reply. Right preference is to use wheel package, we are currently using saxonpy. But let me cross check once on 11.4, can you please share documentation on it if you have it handy.

ond1 commented 1 year ago

Yes I understand installing the wheel is so much easier. Given a few extra steps see documentation on installing the Python extension here: https://www.saxonica.com/saxon-c/documentation11/index.html#!starting/installingpython

You will need to install SaxonC: https://www.saxonica.com/saxon-c/documentation11/index.html#!starting/installing

gouripv commented 1 year ago

Hi @ond1 I have another question for you. Below is my sample code I have for transformation currently for my python based aws lambda. I currently have 2048MB for my lambda and as the requests come in, memory is keep increasing and when the threshold is reached keep getting getting the error as Error: No stylesheet found. Please compile stylsheet before calling transformToString or check exceptions. Are there any options for the time being I can do without upgrading to 11.4.

def transform_xml(input_xml,xsl_path):
    try:
        proc = PySaxonProcessor(license=False)
        xsltproc = proc.new_xslt_processor()
        ROOT_PATH = os.path.abspath(os.path.dirname(__file__))
        file_path = os.path.join(ROOT_PATH, xsl_path)
        f2 = open(file_path, 'r')
        data_xsl = f2.read() 
        document = proc.parse_xml(xml_text=input_xml)
        xsltproc.set_source(xdm_node=document)
        xsltproc.compile_stylesheet(stylesheet_text=data_xsl)
        final_xml = xsltproc.transform_to_string()
        if final_xml is not None:
            return final_xml
        else:
            raise Exception('Exception occured during transformation')
    except Exception as error:       
        raise
    finally:
        f2.close()
        xsltproc=None
        proc=None 
tennom commented 1 year ago

For me, with statement is preferable over try, but let's stick with your way. Is it an option for you to simplify the code with specifying the file paths directly like this, so that you don't need to handle all the file reading and others?

proc = PySaxonProcessor(license=False)
xsltproc = proc.new_xslt_processor()
xsltproc.compile_stylesheet(stylesheet_file=xsl_path)
final_xml = xsltproc.transform_to_string(source_file=input_xml)

If you keep getting the error No stylesheet file, maybe check if you really have the file in the path that the code can access. Your files may need to be in the exact containers or running environments instead of the host file system. Maybe check how you specify files paths in lambda.

gouripv commented 1 year ago

I appreciate your inputs @tennom. Well I don't think it is causing issue because of the file paths and reading the files, the error is happening after reaching some threshold for the memory. If it is file path issue it should fail in the first go for all requests where it have served almost 1k+ request without any issues. I guess as I am not using with it is not cleaning up the proc resources and memory is keep increasing. I can not use with as well because i get this error JNI_CreateJavaVM() failed with result: -5. I tried to define porc and xsltproc at the global level meaning before my handler in the lambda, but even then memory is keep increasing.

tennom commented 1 year ago

@


From: gouripv @.> Sent: Friday, November 18, 2022 9:26:13 PM To: tennom/saxonpy @.> Cc: Tennom @.>; Mention @.> Subject: Re: [tennom/saxonpy] Can only create PySaxonProcessor once per process (Issue #5)

I appreciate your inputs @tennomhttps://github.com/tennom. Well I don't think it is causing issue because of the file paths and reading the files, the error is happening after reaching some threshold for the memory. If it is file path issue it should fail in the first go for all requests where it have served almost 1k+ request without any issues. I guess as I am not using with it is not cleaning up the proc resources and memory is keep increasing. I can not use with as well because i get this error JNI_CreateJavaVM() failed with result: -5. I tried to define porc and xsltproc at the global level meaning before my handler in the lambda, but even then memory is keep increasing.

— Reply to this email directly, view it on GitHubhttps://github.com/tennom/saxonpy/issues/5#issuecomment-1319992485, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACCVVBFE7IB32U7IK76OUYTWI57XLANCNFSM6AAAAAAR52BBSY. You are receiving this because you were mentioned.Message ID: @.***>

ond1 commented 1 year ago

Using SaxonC 1.2.1 as @tennom suggested passing the source as file name as argument to _transform_tostring can help with memory: final_xml = xsltproc.transform_to_string(source_file=input_xml)

gouripv commented 1 year ago

Thank you @tennom and @ond1 for your inputs, i can not really use xsltproc.transform_to_string(source_file=input_xml) because input xml comes in as an input event that lambda receives. Tried using xsltproc.transform_to_string(stylesheet_file="test1.xsl", xdm_node= node) but giving me the error as "saxonc.PyXsltProcessor' object has no attribute 'setSourceFromXdmNode'"

ond1 commented 1 year ago

Looks like a bug. You could try using the PyXslt30Processor class instead

ond1 commented 1 year ago

SaxonC 12.0 test release now available. See issue: https://github.com/tennom/saxonpy/issues/6