metafacture / metafacture-fix

Work in progress towards an implementation of the Fix language for Metafacture
Apache License 2.0
6 stars 2 forks source link

Race conditions when using Threads #255

Closed dr0i closed 2 years ago

dr0i commented 2 years ago

Came up in https://github.com/hbz/lobid-resources/issues/1409#issuecomment-1235382923. This one is not reproducible and seems to be the result of Metafix not being thread safe:

WARN [Thread-4] (Metafix.java:421) - Error while executing Fix expression (at file:/home/sol/git/lobid-resources-alma-fix/src/main/resources/alma/fix/contribution.fix, line 184): do list(path:"110[012] ", "var":"$i") unless exists("$i.M") unless exists("$i.a.1") unless exists("$i.4") if any_match("type[]","ArchivedWebPage|Miscellaneous|Bibliography|Statistics|Legislation|PublishedScore|Game|Image|Map|Standard") add_field("$i.4","cre") elsif any_match("type[]","Periodical|Collection|Series|Newspaper|Journal|PublicationIssue|EditedVolume|Proceedings|Festschrift|ReferenceSource") add_field("$i.4","edt") elsif any_match("type[]","Book|MultiVolumeBook|Article|Thesis|OfficialPublication|Schoolbook|Biography|Report") add_field("$i.4","aut") else add_field("$i.4","cre") end end do list(path: "$i.4", "var":"$j") set_hash("contribution[].$append.agent") do list(path:"$i.0","var":"$k") if all_match("$k", "^\(DE-588\).*$") paste("contribution[].$last.agent.gndIdentifier","$k") paste("contribution[].$last.agent.id","$k") end end paste("contribution[].$last.agent.label", "$i.a", "$i.b", join_char: ". ") set_array("contribution[].$last.agent.type[]","CorporateBody") copy_field("$j","contribution[].$last.role.id") end end end end java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469) at java.util.HashMap$KeyIterator.next(HashMap.java:1493) at org.metafacture.commons.tries.WildcardTrie.matches(WildcardTrie.java:128) at org.metafacture.commons.tries.WildcardTrie.get(WildcardTrie.java:123) at org.metafacture.commons.tries.SimpleRegexTrie.get(SimpleRegexTrie.java:75) at org.metafacture.metafix.Value$Hash.findFields(Value.java:806) at org.metafacture.metafix.Value$Hash.findFields(Value.java:778) at org.metafacture.metafix.Value$Hash.containsField(Value.java:522) at org.metafacture.metafix.Record.get(Record.java:124) at org.metafacture.metafix.FixBind$1.execute(FixBind.java:33) at org.metafacture.metafix.RecordTransformer.lambda$null$4(RecordTransformer.java:137) at org.metafacture.metafix.RecordTransformer.lambda$null$22(RecordTransformer.java:225) at org.metafacture.metafix.RecordTransformer.lambda$null$1(RecordTransformer.java:124) at org.metafacture.metafix.RecordTransformer.tryRun(RecordTransformer.java:236) at org.metafacture.metafix.RecordTransformer.lambda$transform$2(RecordTransformer.java:124) at java.lang.Iterable.forEach(Iterable.java:75) at org.metafacture.metafix.RecordTransformer.transform(RecordTransformer.java:123) at org.metafacture.metafix.RecordTransformer.transform(RecordTransformer.java:115) at org.metafacture.metafix.FixMethod$1.apply(FixMethod.java:60) at org.metafacture.metafix.RecordTransformer.lambda$null$20(RecordTransformer.java:205) at org.metafacture.metafix.RecordTransformer.lambda$null$22(RecordTransformer.java:225) at org.metafacture.metafix.RecordTransformer.lambda$null$1(RecordTransformer.java:124) at org.metafacture.metafix.RecordTransformer.tryRun(RecordTransformer.java:236) at org.metafacture.metafix.RecordTransformer.lambda$transform$2(RecordTransformer.java:124) at java.lang.Iterable.forEach(Iterable.java:75) at org.metafacture.metafix.RecordTransformer.transform(RecordTransformer.java:123) at org.metafacture.metafix.Metafix.endRecord(Metafix.java:202) at org.metafacture.biblio.marc21.MarcXmlHandler.endElement(MarcXmlHandler.java:132) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.metafacture.xml.XmlDecoder.process(XmlDecoder.java:69) at org.metafacture.xml.XmlDecoder.process(XmlDecoder.java:43) at org.metafacture.strings.StringReader.process(StringReader.java:48) at org.metafacture.strings.StringReader.process(StringReader.java:34) at org.metafacture.flowcontrol.ObjectPipeDecoupler$Feeder.run(ObjectPipeDecoupler.java:175) at java.lang.Thread.run(Thread.java:748)

The threads are created using Metafacture's flowcontrol.ObjectPipeDecoupler.

dr0i commented 2 years ago

From https://github.com/hbz/lobid-resources/issues/1409#issuecomment-1235524415 , might also be a result of unsafe threads:

INFO [Thread-5] (Metafix.java:414) - Current record: {leader=01879ngm a2200385ua 4500, 006=m o c , 007=[cr una---unuuu, vz uzazuu], 008=160826p20162006cau023 o vlund d, 001=991025930286006442, 02852={a=1089373, b=Kanopy}, 035 =[{a=(OCoLC)956893319, 0=(uri) http://www.worldcat.org/oclc/956893319}, {a=(CaSfKAN)kan1089373}], 040 ={a=VDU, b=eng, c=VDU}, 24500={a=Lockie Leonard Series 1. Episode 13: It's not You, it's Me}, 264 1=[{b=Australian Children's Television Foundation,, c=2006.}, {a=[San Francisco, California, USA] :, b=Kanopy Streaming,, c=2016.}], 300 ={a=1 online resource (streaming video file) (24 minutes):, b=digital, .flv file, sound}, 306 ={a=002354}, 336 ={a=tdi, b=tdi, 2=rdacontent}, 337 ={b=[c, c], 2=rdamedia}, 338 ={b=[cr, cr], 2=rdacarrier}, 500 =[{a=Filmclip}, {a=Title from title frames.}, {a=In Process Record.}], 518 ={a=Originally produced by Australian Children's Television Foundation in 2006.}, 520 ={a=Lockie has some problems. First, he's decided to break up with Dot. It's just figuring out how that's tricky. Meanwhile, Mum is acting weird. She's been staring at the TV for days, when it's not even tuned to a channel. Phillip's not impressed - he's missing his favourite superhero cartoon, Captain Chicken. And to top things off, Egg and Lockie's quest to save the river is going nowhere... Lockie has got some serious work to do. And he's going to have to find his inner Captain Chicken to sort it all out.}, 538 ={a=Mode of access: World Wide Web.}, 546 ={a=In English}, 653 ={a=Australian Studies}, 7102 =[{a=Australian Children's Television Foundation (Firm),, 4=dst}, {a=Kanopy (Firm),, 4=dst}], 85640={u=https://bielefeld.kanopy.com/node/89374, z=A Kanopy streaming video}, 85642={z=Cover Image, u=https://www.kanopy.com/node/89374/external-image}, 9801 ={a=ZDB-177-KAN}, MBD ={M=49HBZ_BIE, i=991025930286006442, n=Universität Bielefeld}, MNG ={c=System, f=OTHER, i=marc21, k=01, e=false, d=2022-06-07 18:14:09 Europe/Berlin, g=kan1089373, j=00, a=System, b=2022-06-07 18:14:11 Europe/Berlin}, POR ={x=System, a=53344020580006442, M=49HBZ_BIE, p=61334816600006442, q=Kanopy, c=static, B=17.131 Titel, l=A Kanopy streaming video, y=2022-06-07 18:14:20 Europe/Berlin, w=2022-06-07 18:14:16 Europe/Berlin, D=https://eu04.alma.exlibrisgroup.com/view/uresolver/49HBZ_NETWORK/openurl?u.ignore_date_coverage=true&portfolio_pid=53344020580006442&Force_direct=true, v=System, z=2022-06-07 16:14:16, i=false, g=UB_BI, t=BIE-8465, S=62343002520006442, d=https://hbz-network.userservices.exlibrisgroup.com/view/uresolver/49HBZ_NETWORK/openurl?u.ignore_date_coverage=true&rft.mms_id=991025930286006442, b=Available, f=VIDEO, e=https://bielefeld.kanopy.com/node/89374, 8=53344020580006442}, @context=http://lobid.org/resources/context.jsonld, almaMmsId=991025930286006442, id=http://lobid.org/resources/991025930286006442#!, urn[]=[]} WARN [Thread-4] (Metafix.java:421) - Error while executing Fix expression (at file:/home/sol/git/lobid-resources-alma-fix/src/main/resources/alma/fix/identifiers.fix, line 23): do list(path: "0247?", "var": "$i") if any_equal("$i.2","urn") copy_field("$i.a","urn[].$append") end end java.lang.IllegalStateException: Key '0' already used at org.metafacture.commons.tries.CharMap.put(CharMap.java:112) at org.metafacture.commons.tries.CharMap.put(CharMap.java:97) at org.metafacture.commons.tries.WildcardTrie$Node.addNext(WildcardTrie.java:158) at org.metafacture.commons.tries.WildcardTrie.simplyPut(WildcardTrie.java:79) at org.metafacture.commons.tries.WildcardTrie.put(WildcardTrie.java:67) at org.metafacture.commons.tries.SimpleRegexTrie.put(SimpleRegexTrie.java:63) at org.metafacture.metafix.Value$Hash.findFields(Value.java:789) at org.metafacture.metafix.Value$Hash.findFields(Value.java:778) at org.metafacture.metafix.Value$Hash.containsField(Value.java:522) at org.metafacture.metafix.Record.get(Record.java:124) at org.metafacture.metafix.FixBind$1.execute(FixBind.java:33) at org.metafacture.metafix.RecordTransformer.lambda$null$4(RecordTransformer.java:137) at org.metafacture.metafix.RecordTransformer.lambda$null$22(RecordTransformer.java:225) at org.metafacture.metafix.RecordTransformer.lambda$null$1(RecordTransformer.java:124) at org.metafacture.metafix.RecordTransformer.tryRun(RecordTransformer.java:236) at org.metafacture.metafix.RecordTransformer.lambda$transform$2(RecordTransformer.java:124) at java.lang.Iterable.forEach(Iterable.java:75) at org.metafacture.metafix.RecordTransformer.transform(RecordTransformer.java:123) at org.metafacture.metafix.RecordTransformer.transform(RecordTransformer.java:115) at org.metafacture.metafix.FixMethod$1.apply(FixMethod.java:60) at org.metafacture.metafix.RecordTransformer.lambda$null$20(RecordTransformer.java:205) at org.metafacture.metafix.RecordTransformer.lambda$null$22(RecordTransformer.java:225) at org.metafacture.metafix.RecordTransformer.lambda$null$1(RecordTransformer.java:124) at org.metafacture.metafix.RecordTransformer.tryRun(RecordTransformer.java:236) at org.metafacture.metafix.RecordTransformer.lambda$transform$2(RecordTransformer.java:124) at java.lang.Iterable.forEach(Iterable.java:75) at org.metafacture.metafix.RecordTransformer.transform(RecordTransformer.java:123) at org.metafacture.metafix.Metafix.endRecord(Metafix.java:202) at org.metafacture.biblio.marc21.MarcXmlHandler.endElement(MarcXmlHandler.java:132) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.metafacture.xml.XmlDecoder.process(XmlDecoder.java:69) at org.metafacture.xml.XmlDecoder.process(XmlDecoder.java:43) at org.metafacture.strings.StringReader.process(StringReader.java:48) at org.metafacture.strings.StringReader.process(StringReader.java:34) at org.metafacture.flowcontrol.ObjectPipeDecoupler$Feeder.run(ObjectPipeDecoupler.java:175) at java.lang.Thread.run(Thread.java:748)

dr0i commented 2 years ago

Some more exceptions - they were plentiful and this is all I have left in my notes:

a)

java.lang.NullPointerException at org.metafacture.metafix.Value$Hash.findFields(Value.java:813) at org.metafacture.metafix.Value$Hash.findFields(Value.java:778) at org.metafacture.metafix.Value$Hash.containsField(Value.java:522) at org.metafacture.metafix.Record.get(Record.java:124) at org.metafacture.metafix.FixBind$1.execute(FixBind.java:33)

b)

java.lang.IllegalStateException at java.util.regex.Matcher.start(Matcher.java:343) at org.metafacture.metafix.Value$Hash.findFields(Value.java:792) at org.metafacture.metafix.Value$Hash.findFields(Value.java:778) at org.metafacture.metafix.Value$Hash.containsField(Value.java:522) at org.metafacture.metafix.Record.get(Record.java:124)

c)

java.lang.StringIndexOutOfBoundsException: String index out of range: 6 at java.lang.String.substring(String.java:1963) at org.metafacture.metafix.Value$Hash.findFields(Value.java:792) at org.metafacture.metafix.Value$Hash.findFields(Value.java:778) at org.metafacture.metafix.Value$Hash.containsField(Value.java:522) at org.metafacture.metafix.Record.get(Record.java:124)

d)

java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469) at java.util.HashMap$KeyIterator.next(HashMap.java:1493) at org.metafacture.commons.tries.WildcardTrie.matches(WildcardTrie.java:128) at org.metafacture.commons.tries.WildcardTrie.get(WildcardTrie.java:123) at org.metafacture.commons.tries.SimpleRegexTrie.get(SimpleRegexTrie.java:75) at org.metafacture.metafix.Value$Hash.findFields(Value.java:806)

e)

2022-09-16 15:52:42 INFO [Thread-7] (Metafix.java:414) - Current record: {leader=00000nmm#a2200000#c#4500, 005=20161110115200.0, 007=cr#|||||||||||, 008=140110|2011####xx######o##|########ger#c, 001=990014607840206447, 035 =[{a=WU000020023}, {a=(DE-468)WU000020023}, {a=(Aleph)001460784WUP01-Aleph}], 020 ={a=9783170205291, c=(print)}, 041 ={a=ger}, 1001 ={a=Schubarth, Wilfried, 4=aut}, 24510={a=Gewalt und Mobbing an Schulen, h=Elektronische Ressource, b=Möglichkeiten der Prävention und Intervention, c=Wilfried Schubarth}, 264 1={a=Stuttgart, b=Kohlhammer, c=2011}, 300 ={a=Online-Ressource (208 S.)}, 520 =[{a=Biographical note: Prof. Dr. Wilfried Schubarth lehrt und forscht am Department Erziehungswissenschaft der Universität Potsdam zu Fragen der Jugend-, Schul- und Bildungsforschung}, {a=Long description: "Gewalt an Schulen" ist ein emotional besetztes Thema in der öffentlichen Diskussion. Umso notwendiger ist eine sachliche und empirisch fundierte Auseinandersetzung. Das Buch gibt einen Überblick über Ausmaß, Erscheinungsformen und Ursachen von Gewalt und Mobbing an Schulen sowie über Möglichkeiten der Prävention bzw. Intervention. Er verbindet systematisch die Analyse der schulischen Gewaltphänomene mit Ansätzen der Gewaltprävention bzw. -intervention. Ein Schwerpunkt liegt dabei auf den schulischen Präventions- und Interventionsprogrammen, die einer kritischen Bewertung unterzogen werden. Der interdisziplinär angelegte Band, der Erkenntnisse der Erziehungswissenschaft, Psychologie und Soziologie integriert, verknüpft Ergebnisse der Gewaltforschung mit Anforderungen an eine moderne Präventionsarbeit im Kontext einer Schul- und Bildungsreform. Die systematische und kompakte Darstellung mit Übersichten und Wiederholungsfragen machen den Band zu einer unverzichtbaren Orientierungs- und Arbeitshilfe}], 533 ={a=Online edition}, 5831 ={a=Verfilmt, c=2011, z=Datenkonversion}, 650 4=[{a=(DE-601)104364106 / (DE-588)4076308-0 / Prävention / gbv}, {a=(DE-601)104409630 / (DE-588)4053474-1 / Schule / gbv}, {a=(DE-601)105502421 / (DE-588)4157237-3 / Gewalttätigkeit / gbv}, {a=(DE-601)126506744 / (DE-588)4314977-7 / Mobbing / gbv}], 688 =[{a=Bullying in schools / Germany / Psychological aspects}, {a=School discipline}, {a=School social work / Germany}, {a=School violence / Germany / Prevention}], 78000={t=[Früher u.d.T., Schubarth, Wilfried, Gewaltprävention in Schule und Jugendhilfe]}, 8564 ={u=http://www.content-select.com/index.php?id=bib_view&ean=9783170228702}, 912 ={a=ZDB-84-KOM}, 962 ={e=ZDB-84-KOM}, MBD ={M=49HBZ_WUP, i=990014607840206447, n=Bergische Universität Wuppertal}, MNG ={f=ILS, i=marc21, k=01, e=false, g=001460784-WUP01, j=00, a=import, b=2021-04-06 10:14:53 Europe/Berlin}, POR ={x=System, a=53152804690006447, M=49HBZ_WUP, c=static, y=2021-04-07 11:17:49 Europe/Berlin, w=2021-04-07 11:17:43 Europe/Berlin, D=https://eu04.alma.exlibrisgroup.com/view/uresolver/49HBZ_NETWORK/openurl?u.ignore_date_coverage=true&portfolio_pid=53152804690006447&Force_direct=true, v=P2E_JOB, z=2021-04-07 09:17:43, i=false, S=52152804700006447, d=https://hbz-network.userservices.exlibrisgroup.com/view/uresolver/49HBZ_NETWORK/openurl?u.ignore_date_coverage=true&rft.mms_id=990014607840206447, b=Available, f=DOCUMENT, e=http://www.content-select.com/index.php?id=bib_view%26ean=9783170228702, 8=53152804690006447}, @context=http://lobid.org/resources/context.jsonld, almaMmsId=990014607840206447, id=http://lobid.org/resources/990014607840206447#!, @isbn[]=[9783170205291], isbn[]=[9783170205291, 3170205293], @title=Gewalt und Mobbing an Schulen, title=Gewalt und Mobbing an Schulen, alternativeTitle[]=[], otherTitleInformation[]=[Möglichkeiten der Prävention und Intervention], edition[]=[], publication[]=[{test=, publicationHistory=2011, startDate=2011, type[]=[PublicationEvent], location[]=[Stuttgart], publishedBy[]=[Kohlhammer]}], describedBy={id=http://lobid.org/resources/990014607840206447, label=Webseite der hbz-Ressource 990014607840206447, type[]=[BibliographicDescription], dateCreated=20210406, inDataset={id=http://lobid.org/resources/dataset#!, label=lobid-resources – Der hbz-Verbundkatalog als Linked Open Data}, resultOf={type[]=[CreateAction], endTime=2022-09-16T15:52:42, instrument={id=https://github.com/hbz/lobid-resources, type[]=[SoftwareApplication], label=Software lobid-resources}, object={id=https://lobid.org/hbz01/990014607840206447, type[]=[DataFeedItem], label=hbz-Ressource 990014607840206447 im Exportformat MARC21 XML, inDataset={id=https://datahub.io/dataset/hbz_unioncatalog, label=hbz_unioncatalog}}}, license[]=[{id=http://creativecommons.org/publicdomain/zero/1.0, label=Creative Commons-Lizenz CC0 1.0 Universal}]}, @createTime=1, sameAs[]=[], supplement[]=[], isPartOf[]=[], containedIn[]=[], primaryForm[]=[], tableOfContents[]=[], description[]=[], seeAlso[]=[], fulltextOnline[]=[], related[]=[], inCollection[]=[{id=http://lobid.org/organisations/ZDB-84-KOM#!, label=eResource package, type[]=[Collection]}], similar[]=[], predecessor[]=[{test=, label=Früher u.d.T.}]} 2022-09-16 15:52:42 WARN [Thread-7] (Metafix.java:421) - Error while executing Fix expression (at file:~/lobid-resources-alma-fix/src/main/resources/alma/fix/relatedRessourcesAndLinks.fix, line 475): replace_all("predecessor[].*.label","Vorg. ---> ","") java.lang.IllegalArgumentException: Can't find: 2 in: null at org.metafacture.metafix.FixPath.insertInto(FixPath.java:297) at org.metafacture.metafix.FixPath.insertInto(FixPath.java:263) at org.metafacture.metafix.FixPath.lambda$null$20(FixPath.java:292) at org.metafacture.metafix.Value$TypeMatcher.match(Value.java:349) at org.metafacture.metafix.Value$TypeMatcher.ifArray(Value.java:322) at org.metafacture.metafix.FixPath.lambda$insertInto$22(FixPath.java:292) at org.metafacture.metafix.Value.extractType(Value.java:212) at org.metafacture.metafix.FixPath.insertInto(FixPath.java:291) at org.metafacture.metafix.FixPath.insertInto(FixPath.java:281) at org.metafacture.metafix.Record.lambda$transform$2(Record.java:212) at org.metafacture.metafix.Value.asList(Value.java:190) at org.metafacture.metafix.Value.asList(Value.java:184)

f)

java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469) at java.util.HashMap$KeyIterator.next(HashMap.java:1493) at org.metafacture.commons.tries.WildcardTrie.get(WildcardTrie.java:96) at org.metafacture.commons.tries.SimpleRegexTrie.get(SimpleRegexTrie.java:75) at org.metafacture.metafix.Value$Hash.findFields(Value.java:806)

blackwinter commented 2 years ago

I assume the exception in case b) was java.lang.IllegalStateException and in case e) it was java.lang.IllegalArgumentException, right?

dr0i commented 2 years ago

Yep, right!

blackwinter commented 2 years ago

Case e) seems to be a data/Fix issue, maybe you can show the full error message (including the current record).

Everything else boils down to Value$Hash.findFields(), which is also supported by the analysis that the only static mutable, non-thread-safe objects are used there (PREFIX_CACHE = HashMap, TRIE_CACHE = HashMap, TRIE = SimpleRegexTrie). So my hypothesis would be that it's sufficient to make this one method synchronized (individually thread-safe data structures are not sufficient as these collections depend on each other, they have to be collectively thread-safe).

[ETA: Maybe it's not sufficient to synchronize on the Hash, but on the Metafix instance instead! Or on the class by making the method static?]

             for (final String term : ALTERNATION_PATTERN.split(pattern)) {
-                findFields(term, fieldSet);
+                findFields(term, fieldSet, map);
             }

             return fieldSet;
         }

-        private void findFields(final String pattern, final Set<String> fieldSet) {
+        private static synchronized void findFields(final String pattern, final Set<String> fieldSet, final Map<String, Value> map) {

Now I would definitely like to create tests for these thread-safety issues but I haven't gotten anywhere yet.[^1] Maybe you could just do some manual testing in order to verify the fix. We should also bear in mind that this method is absolutely on the critical path (cf. #207) so we definitely have to keep an eye on how it affects performance (single-threaded and multi-threaded).

[^1]: Some pointers for thread-safety testing: jcstress, ConcurrentUnit, MultithreadedTC, vmlens

blackwinter commented 2 years ago

Benchmark results for static synchronized method:

Benchmark (fixDef) (input) Score @ c8ba171 Score @ static Units Boost
Baseline N/A N/A 3031.762 ± 4.847 3031.196 ± 4.872 ops/us -0.02%
FixParse nothing N/A 34.818 ± 0.728 34.655 ± 0.906 ops/s -0.47%
FixParse alma N/A 25.210 ± 0.681 25.225 ± 0.732 ops/s +0.06%
Metafix nothing empty 2263.035 ± 60.509 2244.932 ± 19.264 ops/s -0.80%
Metafix nothing alma-small 38.668 ± 1.142 37.977 ± 1.920 ops/s -1.79%
Metafix alma empty 767.576 ± 10.739 764.330 ± 1.974 ops/s -0.42%
Metafix alma alma-small 4.042 ± 0.046 3.981 ± 0.037 ops/s -1.51%
SlowMetafix nothing alma-large 23.015 ± 0.994 23.037 ± 0.804 ops/min +0.10%
SlowMetafix alma alma-large 2.474 ± 0.036 2.462 ± 0.032 ops/min -0.49%
Benchmark (fixDef) (input) Score @ c8ba171 Score @ static+synchronized Units Boost
Baseline N/A N/A 3031.762 ± 4.847 3031.383 ± 1.551 ops/us -0.01%
FixParse nothing N/A 34.818 ± 0.728 34.834 ± 0.871 ops/s +0.05%
FixParse alma N/A 25.210 ± 0.681 25.480 ± 0.713 ops/s +1.07%
Metafix nothing empty 2263.035 ± 60.509 2256.579 ± 26.081 ops/s -0.29%
Metafix nothing alma-small 38.668 ± 1.142 38.282 ± 0.188 ops/s -1.00%
Metafix alma empty 767.576 ± 10.739 782.327 ± 8.896 ops/s +1.92%
Metafix alma alma-small 4.042 ± 0.046 3.956 ± 0.042 ops/s -2.13%
SlowMetafix nothing alma-large 23.015 ± 0.994 22.901 ± 0.322 ops/min -0.50%
SlowMetafix alma alma-large 2.474 ± 0.036 2.440 ± 0.059 ops/min -1.37%
blackwinter commented 2 years ago

Benchmark results for configurable synchronization (intrinsic lock):

Details ```diff @@ -504,6 +504,8 @@ public class Value implements JsonValue { // checkstyle-disable-line ClassDataAb private static final Map> TRIE_CACHE = new HashMap<>(); private static final SimpleRegexTrie TRIE = new SimpleRegexTrie<>(); + private static final Object LOCK = Boolean.getBoolean("org.metafacture.metafix.threadSafe") ? new Object() : null; + private final Map map = new LinkedHashMap<>(); /** @@ -775,7 +777,14 @@ public class Value implements JsonValue { // checkstyle-disable-line ClassDataAb final Set fieldSet = new LinkedHashSet<>(); for (final String term : ALTERNATION_PATTERN.split(pattern)) { - findFields(term, fieldSet); + if (LOCK != null) { + synchronized (LOCK) { + findFields(term, fieldSet); + } + } + else { + findFields(term, fieldSet); + } } ``` Set `-Dorg.metafacture.metafix.threadSafe=true` to enable lock.
Benchmark (fixDef) (input) Score @ c8ba171 Score @ lock-disabled Units Boost
Baseline N/A N/A 3031.762 ± 4.847 3031.914 ± 3.001 ops/us +0.01%
FixParse nothing N/A 34.818 ± 0.728 34.599 ± 0.754 ops/s -0.63%
FixParse alma N/A 25.210 ± 0.681 25.316 ± 0.653 ops/s +0.42%
Metafix nothing empty 2263.035 ± 60.509 2261.131 ± 28.884 ops/s -0.08%
Metafix nothing alma-small 38.668 ± 1.142 40.038 ± 0.282 ops/s +3.54%
Metafix alma empty 767.576 ± 10.739 781.751 ± 1.895 ops/s +1.85%
Metafix alma alma-small 4.042 ± 0.046 3.976 ± 0.284 ops/s -1.63%
SlowMetafix nothing alma-large 23.015 ± 0.994 22.976 ± 0.860 ops/min -0.17%
SlowMetafix alma alma-large 2.474 ± 0.036 2.490 ± 0.012 ops/min +0.65%
Benchmark (fixDef) (input) Score @ c8ba171 Score @ lock-enabled Units Boost
Baseline N/A N/A 3031.762 ± 4.847 3030.142 ± 7.040 ops/us -0.05%
FixParse nothing N/A 34.818 ± 0.728 34.218 ± 0.774 ops/s -1.72%
FixParse alma N/A 25.210 ± 0.681 25.353 ± 0.840 ops/s +0.57%
Metafix nothing empty 2263.035 ± 60.509 2227.026 ± 15.194 ops/s -1.59%
Metafix nothing alma-small 38.668 ± 1.142 38.913 ± 0.441 ops/s +0.63%
Metafix alma empty 767.576 ± 10.739 756.706 ± 15.609 ops/s -1.42%
Metafix alma alma-small 4.042 ± 0.046 4.063 ± 0.079 ops/s +0.52%
SlowMetafix nothing alma-large 23.015 ± 0.994 23.112 ± 0.149 ops/min +0.42%
SlowMetafix alma alma-large 2.474 ± 0.036 2.435 ± 0.007 ops/min -1.58%
dr0i commented 2 years ago

Updated Case e). Benchmarking looks good I think. Going to test with 20M records. Can you open a branch for convenient testing?

blackwinter commented 2 years ago

Can you open a branch for convenient testing?

For which implementation? They're both pretty straight forward, though, and it depends on which one you want to test.

dr0i commented 2 years ago

Would want to test both. Concerning the configurable one: I opt for "threadsafe" as default and setting to "unsafe" optionally.

blackwinter commented 2 years ago

Pushed both branches.

I opt for "threadsafe" as default and setting to "unsafe" optionally.

Sure, we can decide it later. I've stuck to the simpler implementation for now.

blackwinter commented 2 years ago

Updated Case e).

That's indeed an unrelated issue. label is an array, so replace_all() fails (see 98bd5cd).

dr0i commented 2 years ago

~~Tried to test 255-makeFindFieldsOptionallyThreadSafe but forgot to set the option :P . The outcome, however, resulted in zero errors. Also remarkable: all 24M records needed only 4h (!) ( executed with 6 threads on a 12 core server with near to zero load (the latter may explain the speed). Java options were -Xmx8G -Xms2G). Now, it's hard to reproduce unsafe thread exceptions as we all know. But I wonder if https://github.com/metafacture/metafacture-fix/pull/259 has resolved these btw resp. made these so much more unlikely to happen that they didn't happen (this time). I am going to test your branch with option set to see the performance impact. We would then run the ETL some time without that option and see if at some point thread induced exceptions appear and heal them with the option set (if the performance test gives reasonable results).~~

dr0i commented 2 years ago

Dump my last comment: configuration was wrong, it used the morph :(

blackwinter commented 2 years ago

Limetrans results for writing approx. 2m records to local file:

Metafix Runtime Boost
master (98bd5cd) 4h59m38s
255-makeFindFieldsThreadSafe (78b985b) 5h01m10s +0.51%
255-makeFindFieldsOptionallyThreadSafe (ad7d84e) 4h58m42s -0.31%
255-makeFindFieldsOptionallyThreadSafe w/ -Dorg.metafacture.metafix.threadSafe=true 5h04m42s +1.69%
dr0i commented 2 years ago

Did the test with 24M docs. Results are good in the way as no more errors occurred, but disappointing as the needed time rose from ~9h to ~25h. Double checked that. I used https://github.com/metafacture/metafacture-fix/commit/ad7d84e9d0c886fedc55a311cb18634d4d69a19a.

blackwinter commented 2 years ago

Results are good in the way as no more errors occurred,

Great!

but disappointing as the needed time rose from ~9h to ~25h.

That's Morph vs. Fix, right? That was to be expected, wasn't it? It's not even as bad as I had anticipated ;)

dr0i commented 2 years ago

That's Morph vs. Fix, right?

no, it's both using Fix. (Morph needs only 4.5h)

blackwinter commented 2 years ago

no, it's both using Fix.

Ah, okay. Must be thread contention then. Maybe reducing the number of threads helps? (You're probably not going to get it down to the unsafe runtime, though.)

dr0i commented 2 years ago

Funny that you propose decreasing of threads while I would assume that increasing them would gain us speed. (The execution is done using 6 Threads (while it's an 12-Core CPU where we could safely use max 11 threads). Gonna test both scenarios.

(You're probably not going to get it down to the unsafe runtime,

sure, but an increase of factor 2.5 feels just bad. Also, as I have stated in https://github.com/metafacture/metafacture-fix/pull/257#issuecomment-1249233831, we may just want to go with thread-unsafe and take the loss of 100 docs or so? Ok, this maybe even feels bader ;) (and it may result in corrupted docs (I am not sure if it just bails out or corruption takes place. If it ignores the concurrent modificated docs we could just reindex these, but am I right that some of the errors indicates that corruption takes place?)

blackwinter commented 2 years ago

Funny that you propose decreasing of threads while I would assume that increasing them would gain us speed.

Well, if the threads are fighting over a common resource, I would assume that increasing the number of contestants can only make it worse. But you're right, data over speculation ;)

we may just want to go with thread-unsafe and take the loss of 100 docs or so? Ok, this maybe even feels bader ;)

That would be an absolute no-go for us at least. Sure, some hundred(s) of documents out of tens of millions might never be noticed by users, but I think we should aim for completeness as much as possible for the union catalogue (especially as it would even be non-deterministic).

blackwinter commented 2 years ago

I am not sure if it just bails out or corruption takes place. If it ignores the concurrent modificated docs we could just reindex these, but am I right that some of the errors indicates that corruption takes place?

Whether documents with exceptions are ignored depends on your strictnessHandlesProcessExceptions and strictness settings. With your current setup, only the particular Fix expression is skipped. So there might be a field missing or it might not be properly formatted or something like that.

But we don't know if all error cases lead to exceptions, do we? Can there be silent corruption with unsynchronized accesses?

(Can you compare your safe and unsafe indexing results to determine the potential types and extent of corruption?)

dr0i commented 2 years ago

Can there be silent corruption with unsynchronized accesses

thinking this over: definitely yes (my proposal was a stupid idea ) => will not compare safe and unsafe indexing

Gonna test both scenarios.

Tests always in thread-safe mode. Values approximate. Also, theses values may vary from time to time (and I don't know why. Will eventually have more data and provide ranges instead of hard numbers):

3 threads: 100k/20m/thread => 26h/24M 6 threads: 100k/30m/thread => 20h/24M 11 threads: dumped that, CPU usage is stuck around 600%

Interestingly: 11 threads keeps the CPU usage stuck around 600% (instead of the optimal 1100%). So there is definitely a limit where threads are making sense here (in the context of this server and our fix). I.e., it is as you said, at some point adding threads only increases the overhead without using allocated CPU cycles.

Checking also atm if using https://github.com/metafacture/metafacture-fix/commit/78b985be2e3597a71001b6769a2310719994b87f makes a difference.

Maybe it makes more sense to use gnu parallel to split the BGZF and start several single threaded JVMs. (I read somewhere that starting new JVMs from within a JVM is only possible for java >=10)

blackwinter commented 2 years ago

An alternative approach would be to move the problematic parts into the Metafix instance, then we wouldn't need any synchronization. But it would require extensive changes throughout the code.

dr0i commented 2 years ago

78b985be2e3597a71001b6769a2310719994b87f needed 27h for the complete ETL (7 threads) :( . Gonna try again https://github.com/metafacture/metafacture-fix/commit/ad7d84e9d0c886fedc55a311cb18634d4d69a19a , which needed 25h as stated in https://github.com/metafacture/metafacture-fix/issues/255#issuecomment-1271326838, but will this time use 7 threads instead of 6. (in general speaking, both thread-safe approaches seem not to make a difference).

blackwinter commented 2 years ago

Another alternative might be to utilise thread-local variables (see 662a3b9).

dr0i commented 2 years ago

Your ThreadLocal approach seems very promising, as it uses 100% CPU /Thread! (just discarded the (redundant, not promising looking) test mentioned in https://github.com/metafacture/metafacture-fix/issues/255#issuecomment-1273097215).

First 100k needed only: 7 threads: 100k/20m/thread => ~12h/24M Gonna testing using 10 threads. Results tomorrow.

blackwinter commented 2 years ago

Benchmark results for thread-local variables:

Benchmark (fixDef) (input) Score @ c8ba171 Score @ 662a3b9 Units Boost
Baseline N/A N/A 3031.762 ± 4.847 3031.059 ± 4.275 ops/us -0.02%
FixParse nothing N/A 34.818 ± 0.728 34.787 ± 0.829 ops/s -0.09%
FixParse alma N/A 25.210 ± 0.681 25.381 ± 0.591 ops/s +0.68%
Metafix nothing empty 2263.035 ± 60.509 2263.331 ± 57.177 ops/s +0.01%
Metafix nothing alma-small 38.668 ± 1.142 37.573 ± 0.949 ops/s -2.83%
Metafix alma empty 767.576 ± 10.739 785.372 ± 15.244 ops/s +2.32%
Metafix alma alma-small 4.042 ± 0.046 4.039 ± 0.037 ops/s -0.07%
SlowMetafix nothing alma-large 23.015 ± 0.994 22.666 ± 1.245 ops/min -1.52%
SlowMetafix alma alma-large 2.474 ± 0.036 2.452 ± 0.061 ops/min -0.89%
dr0i commented 2 years ago

ThreadLocal completes ETL 24M Alma Fix with 10 threads in 6h10m :+1:

Suggestion: open PR 255-makeFindFieldsThreadSafeWithThreadLocal and merge into master. Close this issue.

blackwinter commented 2 years ago

Excellent :) Let me just verify that there's no regression in Limetrans.

I've opened #263 which you can now review; if everything works out I'll merge it tomorrow. Okay?

dr0i commented 2 years ago

I've opened https://github.com/metafacture/metafacture-fix/pull/263 which you can now review; if everything works out I'll merge it tomorrow. Okay?

ahm. Did read this too late. Already merged. I reopen this issue so we can react properly if you find a regression.

blackwinter commented 2 years ago

Limetrans results for writing approx. 2m records to local file:

Metafix Runtime Boost
master (ce9157f) 4h36m49s
255-makeFindFieldsThreadSafeWithThreadLocal (0f2a0df) 4h25m05s -4.24%