metafacture / metafacture-core

Core package of the Metafacture tool suite for metadata processing.
https://metafacture.org
Apache License 2.0
71 stars 34 forks source link

Implement record path filter. #386

Closed blackwinter closed 3 years ago

blackwinter commented 3 years ago

Split event stream into records based on entity path.

Related to #382 and org.metafacture.xml.XmlElementSplitter.

Resolves #385.

blackwinter commented 3 years ago

Proof of concept; more test cases to follow.

blackwinter commented 3 years ago

Open questions:

blackwinter commented 3 years ago

It would be nice if we could extract the TestHelpers into a common package (e.g. metafacture-framework) so we can reuse it here as well as in org.metafacture.metamorph. The "problem" is the Mockito dependency. Any ideas?

fsteeg commented 3 years ago

Proof of concept; more test cases to follow.

Great, looks good, many thanks!

Should it be allowed to filter for literals as well? (Emitting the literal directly)

Not sure what you mean, something like literal-to-object / LiteralToObject (emits literal values as objects)?

What should happen with records that don't match the filter condition? Should the record start/end events still be emitted?

Hm, they would always be empty right? I think the records should be skipped entirely. I think it makes sense: if we say e.g. 'records are in the data field', and there is no data field, we should not get any records.

It would be nice if we could extract the TestHelpers into a common package (e.g. metafacture-framework)

Sounds good, in metafacture-framework/src/test/java/org/metafacture/framework/helpers?

The "problem" is the Mockito dependency. Any ideas?

I don't understand what you mean with the problem... metafacture-framework already depends on Mockito if that's what you mean.

blackwinter commented 3 years ago

Thanks for having a look at this.

Not sure what you mean, something like literal-to-object / LiteralToObject (emits literal values as objects)?

No, not really. I meant, what should happen here?

    @Test
    public void shouldProcessLiteralPath() {
        assertFilter(
                i -> {
                    i.setPath("lit");

                    i.startRecord("1");
                    i.literal("lit", "record 1");
                    i.endRecord();
                    i.startRecord("2");
                    i.literal("lit", "record 2");
                    i.endRecord();
                },
                o -> {
                    o.get().startRecord("1");
                    o.get().literal("lit", "record 1"); // ?
                    o.get().endRecord();
                    o.get().startRecord("2");
                    o.get().literal("lit", "record 2"); // ?
                    o.get().endRecord();
                }
        );
    }

This would behave differently, because with an entity filter path, the entity element itself would not be emitted.

Hm, they would always be empty right? I think the records should be skipped entirely. I think it makes sense: if we say e.g. 'records are in the data field', and there is no data field, we should not get any records.

OK, I'll adapt it accordingly.

Sounds good, in metafacture-framework/src/test/java/org/metafacture/framework/helpers?

Well, not exactly. I thought it would have to go in main instead of test. Would it be OK for you to provide a helper class to other modules from the test directory? Note that metafacture-framework is an api dependency of metafacture-mangling. So that would have to change, too.

I don't understand what you mean with the problem... metafacture-framework already depends on Mockito if that's what you mean.

Yes, it's currently a testImplementation dependency. I assumed it would have to be changed to implementation.

blackwinter commented 3 years ago

Updated implementation:

Should be functionally complete now.