key4hep / k4MarlinWrapper

GaudifyMarlinProcessors
Apache License 2.0
2 stars 19 forks source link

constant value replacement and include loading is not done during conversion of xml steering files #18

Closed tmadlener closed 4 years ago

tmadlener commented 4 years ago

In Marlin steering files it is possible to define constants

<constants>
  <constant name="OutputBaseName" value="StandardReco" />
  <constant name="RECOutputFile" value="${OutputBaseName}_REC.slcio" />
</constants>

that can then be referred to later in the steering file:

<processor name="OutputProcessor" type="LCIOOutputProcessor">
  <parameter name="LCIOOutputFile" type="string"> ${RECOutputFile} </parameter>
  <!-- more paremeters -->
</processor>

Similarly it is possible to define additional configuration files that should be included, e.g.

<include ref="Tracking/TrackingDigi.xml" />

Both of these are currently resolved by the marlin XMLParser, so they should probably be handled during the conversion of the xml steering file, in order to be fully able to easily convert workflows.

andresailer commented 4 years ago

From discussion in the key4hep meeting today:

tmadlener commented 4 years ago

Using the files mentioned by Andre above, I have quickly tried the following

Marlin -n MarlinStdReco.xml

to produce the MarlinStdRecoParsed.xml file that has all the includes resolved and running that through the converter. Marlin only replaces the constants after it has written this merged xml file.

One thing that we have to consider in this case, is that e.g. the following lines exist here:

<!-- Overlay background, if any (see constant RunOverlay) -->
    <if condition="${RunOverlay}">
      <group name="BgOverlay" />
    </if>

They get resolved to the following ones

<!-- Overlay background, if any (see constant RunOverlay) -->
        <processor name="BgOverlayWW" condition="${RunOverlay}" />
        <processor name="BgOverlayWB" condition="${RunOverlay}" />
        <processor name="BgOverlayBW" condition="${RunOverlay}" />
        <processor name="BgOverlayBB" condition="${RunOverlay}" />
        <processor name="PairBgOverlay" condition="${RunOverlay}" />

I don't think the current converter can handle such conditional running of processors right now. At least in this case all these processors are added to the algList unconditionally.

Maybe it would be good to not directly replace all the constants during the conversion, but to instead define global variables in the generated python configuration and use these global variables?

andresailer commented 4 years ago

Maybe it would be good to not directly replace all the constants during the conversion, but to instead define global variables in the generated python configuration and use these global variables?

Yes, the current idea @fdplacido and I discussed is adding all constants to a dictionary and doing the replacements via string formatting something like "%(constantName)s" % constantsDict That should work for arbitrary number of constants in a string (as long as > 0), and where there is no constants no formatting will be done.

<processor name="BgOverlayWW" condition="${RunOverlay}" />

I had no idea this existed!

tmadlener commented 4 years ago

Yes, the current idea @fdplacido and I discussed is adding all constants to a dictionary and doing the replacements via string formatting something like "%(constantName)s" % constantsDict That should work for arbitrary number of constants in a string (as long as > 0), and where there is no constants no formatting will be done.

Just to be sure that I get this correctly: The generated configuration file would then look something like this?

# imports from Gaudi etc...

CONSTANTS = {
  'RunOverlay': False,             # just because in the .xml it is also set to false
  'RECOutputFile': 'output.slcio', # for the lack of a more meaningful name
  # other constants
}

# setup the OutputProcessor
OutputProcessor = MarlinProcessorWrapper("OutputProcessor")
OutputProcessor.ProcessorType = "LCIOOutputProcessor"
OutputProcessor.Parameters = [
                             "LCIOOutputFile", CONSTANTS['RECOutputFile'], END_TAG,
                             # other parameters
                             ]

# setup of all the other modules and build the algList

if CONSTANTS['RunOverlay']:
  algList.append(BgOverlayWW)

I.e. the dictionary that holds all the constants values is still present in the generated configuration file.

andresailer commented 4 years ago

I think more like

OutputProcessor.Parameters = [
                             "LCIOOutputFile", "%(RECOutputFile)s" % CONSTANTS, END_TAG,
                             # other parameters
                             ]

Because that also works with any kind of additional string concatenation. And the if is not handled like this at the moment.

fdplacido commented 4 years ago

The if statements are left as commented algorithms at the moment, and need to be manually uncommented.

@tmadlener What configuration are you using when running Marlin -n MarlinStdReco.xml, for optional parameters. I just chose some randomly for the moment.

tmadlener commented 4 years ago

Ah, now I see what you meant. This would definitely make the handling of multiple constants in a string much easier.

tmadlener commented 4 years ago

@tmadlener What configuration are you using when running Marlin -n MarlinStdReco.xml, for optional parameters. I just chose some randomly for the moment.

I hadn't defined any of the optional parameters in this case. I first tried to do something like

Marlin MarlinStdReco.xml -n --constant.OutputBaseName=test_output

but I then realized that in this case these values were not replaced in the merged xml file. I haven't changed anything on the RunOverlay parameter, so that remains unchanged. However, in MarlinStandardReco.xml the if is still present in the xml file, whereas in the merged MarlinStandardRecoParsed.xml file this has been changed to the condition="${RunOverlay}" I have described above.

fdplacido commented 4 years ago

Hi @tmadlener, I implemented the changes for the converter here. Files which have includes for constants should be merged using the Marlin command mentioned in this discussion.

I included a test file, which take a sample file, converts it and runs it to check.

Please let me know if it works with for your use case.

tmadlener commented 4 years ago

Hi @fdplacido,

I have run the new version on the MarlinStdReco.xml from here and there seem to be some cases, where the reading in of the constants does not yet work perfectly. I have put the merged xml file into this gist.

It looks like all the parameters that are not defined with a value tag are currently not properly read in, e.g.

<constant name="DetectorModel" value="ILD_l5_o1_v02" />
<constant name="BeamCalCalibrationFactor">79.6</constant>

currently results in

CONSTANTS = {
    'DetectorModel': 'ILD_l5_o1_v02',
    'BeamCalCalibrationFactor': 'None',
}
fdplacido commented 4 years ago

I fixed that and updated a couple of other things in the same branch. It should work now. I am reworking the way the converter handles this overall, so it should be more robust in the following changes.

tmadlener commented 4 years ago

Hi @fdplacido sorry for the delayed response. So the conversion seems to indeed work now for all the constants now. Thanks :)

Unfortunately, I am running into new problems when trying to actually run the converted python script with k4run. The problem is partially related to the conversion of constants, but I think also partially how the wrapper parses things. To give a bit more detail:

<marlin>
    <constants>
        <constant name="EcalBarrelEnergyFactors">0.0063520964756 0.012902699188</constant>
    </constants>
    <!-- more setup -->
    <processor name="MyEcalBarrelReco" type="RealisticCaloRecoSilicon">
        <parameter name="calibration_factorsMipGev">${EcalBarrelEnergyFactors}</parameter>
        <!-- more parameters -->
    </processor>
    <!-- more processor setup -->
</marlin>

this will be converted into

CONSTANTS = {
        'EcalBarrelEnergyFactors': '0.0063520964756 0.012902699188',
}
MyEcalBarrelReco.Parameters = [
                               "calibration_factorsMipGev", "%(EcalBarrelEnergyFactors)s" % CONSTANTS, END_TAG,
                               # more parameters
                               ]

When running this via k4run, the following error occurs

MyEcalBarrelDigi     INFO Init processor
MyEcalBarrelReco     INFO Parameter values for: MyEcalBarrelReco of type RealisticCaloRecoSilicon
MyEcalBarrelReco     INFO CellIDLayerString  layer
MyEcalBarrelReco     INFO calibration_factorsMipGev  0.0063520964756 0.012902699188
MyEcalBarrelReco     INFO calibration_layergroups  20  11
MyEcalBarrelReco     INFO inputHitCollections  EcalBarrelCollectionDigi
MyEcalBarrelReco     INFO inputRelationCollections  EcalBarrelRelationsSimDigi
MyEcalBarrelReco     INFO outputHitCollections  EcalBarrelCollectionRec
MyEcalBarrelReco     INFO outputRelationCollections  EcalBarrelRelationsSimRec
MyEcalBarrelReco     INFO new processor 0x154eb820
MyEcalBarrelReco    FATAL  Standard std::exception is caught
MyEcalBarrelReco    ERROR lcio::Exception: failed converting float from string: 0.0063520964756 0.012902699188

EventLoopMgr        ERROR Unable to initialize Algorithm: MyEcalBarrelReco
ServiceManager      ERROR Unable to initialize Service: EventLoopMgr
ApplicationMgr      ERROR Application Manager Terminated with error code 1

The problem is that the original value was intended to be a list of values, but the current conversion approach transforms it into a single string, which makes this particular processor fail when trying to parse numbers from it, since it actually expected a list of strings.

I suppose similar things might happen to other lists as well. In principle I think such things are more easily handled on the conversion side, but I am not sure whether the current approach is still feasible in that case. Alternatively, this could potentially also be handled on the wrapper side, e.g. by splitting strings at spaces, but that has the potential to become somewhat brittle.

fdplacido commented 4 years ago

This has been addressed now. I had to change how to parsing works, so lists defined in the constants work now.

tmadlener commented 4 years ago

Thanks for this. The tutorial example is running now and also produces an output file.

The only thing that is not yet done automatically is the disabling of the processors that are marked to be run only conditionally, and the condition depends on a global constant. E.g. the RunOverlay steering all the BG overlay processors in the tutorial example. However, for now I don't think that is a real problem, and it might be enough to document that these have to be disabled by hand if necessary.