Closed dr0i closed 2 years ago
From https://github.com/hbz/lobid-resources/issues/1409#issuecomment-1235524415 , might also be a result of unsafe threads:
INFO [Thread-5] (Metafix.java:414) - Current record: {leader=01879ngm a2200385ua 4500, 006=m o c , 007=[cr una---unuuu, vz uzazuu], 008=160826p20162006cau023 o vlund d, 001=991025930286006442, 02852={a=1089373, b=Kanopy}, 035 =[{a=(OCoLC)956893319, 0=(uri) http://www.worldcat.org/oclc/956893319}, {a=(CaSfKAN)kan1089373}], 040 ={a=VDU, b=eng, c=VDU}, 24500={a=Lockie Leonard Series 1. Episode 13: It's not You, it's Me}, 264 1=[{b=Australian Children's Television Foundation,, c=2006.}, {a=[San Francisco, California, USA] :, b=Kanopy Streaming,, c=2016.}], 300 ={a=1 online resource (streaming video file) (24 minutes):, b=digital, .flv file, sound}, 306 ={a=002354}, 336 ={a=tdi, b=tdi, 2=rdacontent}, 337 ={b=[c, c], 2=rdamedia}, 338 ={b=[cr, cr], 2=rdacarrier}, 500 =[{a=Filmclip}, {a=Title from title frames.}, {a=In Process Record.}], 518 ={a=Originally produced by Australian Children's Television Foundation in 2006.}, 520 ={a=Lockie has some problems. First, he's decided to break up with Dot. It's just figuring out how that's tricky. Meanwhile, Mum is acting weird. She's been staring at the TV for days, when it's not even tuned to a channel. Phillip's not impressed - he's missing his favourite superhero cartoon, Captain Chicken. And to top things off, Egg and Lockie's quest to save the river is going nowhere... Lockie has got some serious work to do. And he's going to have to find his inner Captain Chicken to sort it all out.}, 538 ={a=Mode of access: World Wide Web.}, 546 ={a=In English}, 653 ={a=Australian Studies}, 7102 =[{a=Australian Children's Television Foundation (Firm),, 4=dst}, {a=Kanopy (Firm),, 4=dst}], 85640={u=https://bielefeld.kanopy.com/node/89374, z=A Kanopy streaming video}, 85642={z=Cover Image, u=https://www.kanopy.com/node/89374/external-image}, 9801 ={a=ZDB-177-KAN}, MBD ={M=49HBZ_BIE, i=991025930286006442, n=Universität Bielefeld}, MNG ={c=System, f=OTHER, i=marc21, k=01, e=false, d=2022-06-07 18:14:09 Europe/Berlin, g=kan1089373, j=00, a=System, b=2022-06-07 18:14:11 Europe/Berlin}, POR ={x=System, a=53344020580006442, M=49HBZ_BIE, p=61334816600006442, q=Kanopy, c=static, B=17.131 Titel, l=A Kanopy streaming video, y=2022-06-07 18:14:20 Europe/Berlin, w=2022-06-07 18:14:16 Europe/Berlin, D=https://eu04.alma.exlibrisgroup.com/view/uresolver/49HBZ_NETWORK/openurl?u.ignore_date_coverage=true&portfolio_pid=53344020580006442&Force_direct=true, v=System, z=2022-06-07 16:14:16, i=false, g=UB_BI, t=BIE-8465, S=62343002520006442, d=https://hbz-network.userservices.exlibrisgroup.com/view/uresolver/49HBZ_NETWORK/openurl?u.ignore_date_coverage=true&rft.mms_id=991025930286006442, b=Available, f=VIDEO, e=https://bielefeld.kanopy.com/node/89374, 8=53344020580006442}, @context=http://lobid.org/resources/context.jsonld, almaMmsId=991025930286006442, id=http://lobid.org/resources/991025930286006442#!, urn[]=[]} WARN [Thread-4] (Metafix.java:421) - Error while executing Fix expression (at file:/home/sol/git/lobid-resources-alma-fix/src/main/resources/alma/fix/identifiers.fix, line 23): do list(path: "0247?", "var": "$i") if any_equal("$i.2","urn") copy_field("$i.a","urn[].$append") end end java.lang.IllegalStateException: Key '0' already used at org.metafacture.commons.tries.CharMap.put(CharMap.java:112) at org.metafacture.commons.tries.CharMap.put(CharMap.java:97) at org.metafacture.commons.tries.WildcardTrie$Node.addNext(WildcardTrie.java:158) at org.metafacture.commons.tries.WildcardTrie.simplyPut(WildcardTrie.java:79) at org.metafacture.commons.tries.WildcardTrie.put(WildcardTrie.java:67) at org.metafacture.commons.tries.SimpleRegexTrie.put(SimpleRegexTrie.java:63) at org.metafacture.metafix.Value$Hash.findFields(Value.java:789) at org.metafacture.metafix.Value$Hash.findFields(Value.java:778) at org.metafacture.metafix.Value$Hash.containsField(Value.java:522) at org.metafacture.metafix.Record.get(Record.java:124) at org.metafacture.metafix.FixBind$1.execute(FixBind.java:33) at org.metafacture.metafix.RecordTransformer.lambda$null$4(RecordTransformer.java:137) at org.metafacture.metafix.RecordTransformer.lambda$null$22(RecordTransformer.java:225) at org.metafacture.metafix.RecordTransformer.lambda$null$1(RecordTransformer.java:124) at org.metafacture.metafix.RecordTransformer.tryRun(RecordTransformer.java:236) at org.metafacture.metafix.RecordTransformer.lambda$transform$2(RecordTransformer.java:124) at java.lang.Iterable.forEach(Iterable.java:75) at org.metafacture.metafix.RecordTransformer.transform(RecordTransformer.java:123) at org.metafacture.metafix.RecordTransformer.transform(RecordTransformer.java:115) at org.metafacture.metafix.FixMethod$1.apply(FixMethod.java:60) at org.metafacture.metafix.RecordTransformer.lambda$null$20(RecordTransformer.java:205) at org.metafacture.metafix.RecordTransformer.lambda$null$22(RecordTransformer.java:225) at org.metafacture.metafix.RecordTransformer.lambda$null$1(RecordTransformer.java:124) at org.metafacture.metafix.RecordTransformer.tryRun(RecordTransformer.java:236) at org.metafacture.metafix.RecordTransformer.lambda$transform$2(RecordTransformer.java:124) at java.lang.Iterable.forEach(Iterable.java:75) at org.metafacture.metafix.RecordTransformer.transform(RecordTransformer.java:123) at org.metafacture.metafix.Metafix.endRecord(Metafix.java:202) at org.metafacture.biblio.marc21.MarcXmlHandler.endElement(MarcXmlHandler.java:132) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.metafacture.xml.XmlDecoder.process(XmlDecoder.java:69) at org.metafacture.xml.XmlDecoder.process(XmlDecoder.java:43) at org.metafacture.strings.StringReader.process(StringReader.java:48) at org.metafacture.strings.StringReader.process(StringReader.java:34) at org.metafacture.flowcontrol.ObjectPipeDecoupler$Feeder.run(ObjectPipeDecoupler.java:175) at java.lang.Thread.run(Thread.java:748)
Some more exceptions - they were plentiful and this is all I have left in my notes:
a)
java.lang.NullPointerException at org.metafacture.metafix.Value$Hash.findFields(Value.java:813) at org.metafacture.metafix.Value$Hash.findFields(Value.java:778) at org.metafacture.metafix.Value$Hash.containsField(Value.java:522) at org.metafacture.metafix.Record.get(Record.java:124) at org.metafacture.metafix.FixBind$1.execute(FixBind.java:33)
b)
java.lang.IllegalStateException at java.util.regex.Matcher.start(Matcher.java:343) at org.metafacture.metafix.Value$Hash.findFields(Value.java:792) at org.metafacture.metafix.Value$Hash.findFields(Value.java:778) at org.metafacture.metafix.Value$Hash.containsField(Value.java:522) at org.metafacture.metafix.Record.get(Record.java:124)
c)
java.lang.StringIndexOutOfBoundsException: String index out of range: 6 at java.lang.String.substring(String.java:1963) at org.metafacture.metafix.Value$Hash.findFields(Value.java:792) at org.metafacture.metafix.Value$Hash.findFields(Value.java:778) at org.metafacture.metafix.Value$Hash.containsField(Value.java:522) at org.metafacture.metafix.Record.get(Record.java:124)
d)
java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469) at java.util.HashMap$KeyIterator.next(HashMap.java:1493) at org.metafacture.commons.tries.WildcardTrie.matches(WildcardTrie.java:128) at org.metafacture.commons.tries.WildcardTrie.get(WildcardTrie.java:123) at org.metafacture.commons.tries.SimpleRegexTrie.get(SimpleRegexTrie.java:75) at org.metafacture.metafix.Value$Hash.findFields(Value.java:806)
e)
2022-09-16 15:52:42 INFO [Thread-7] (Metafix.java:414) - Current record: {leader=00000nmm#a2200000#c#4500, 005=20161110115200.0, 007=cr#|||||||||||, 008=140110|2011####xx######o##|########ger#c, 001=990014607840206447, 035 =[{a=WU000020023}, {a=(DE-468)WU000020023}, {a=(Aleph)001460784WUP01-Aleph}], 020 ={a=9783170205291, c=(print)}, 041 ={a=ger}, 1001 ={a=Schubarth, Wilfried, 4=aut}, 24510={a=Gewalt und Mobbing an Schulen, h=Elektronische Ressource, b=Möglichkeiten der Prävention und Intervention, c=Wilfried Schubarth}, 264 1={a=Stuttgart, b=Kohlhammer, c=2011}, 300 ={a=Online-Ressource (208 S.)}, 520 =[{a=Biographical note: Prof. Dr. Wilfried Schubarth lehrt und forscht am Department Erziehungswissenschaft der Universität Potsdam zu Fragen der Jugend-, Schul- und Bildungsforschung}, {a=Long description: "Gewalt an Schulen" ist ein emotional besetztes Thema in der öffentlichen Diskussion. Umso notwendiger ist eine sachliche und empirisch fundierte Auseinandersetzung. Das Buch gibt einen Überblick über Ausmaß, Erscheinungsformen und Ursachen von Gewalt und Mobbing an Schulen sowie über Möglichkeiten der Prävention bzw. Intervention. Er verbindet systematisch die Analyse der schulischen Gewaltphänomene mit Ansätzen der Gewaltprävention bzw. -intervention. Ein Schwerpunkt liegt dabei auf den schulischen Präventions- und Interventionsprogrammen, die einer kritischen Bewertung unterzogen werden. Der interdisziplinär angelegte Band, der Erkenntnisse der Erziehungswissenschaft, Psychologie und Soziologie integriert, verknüpft Ergebnisse der Gewaltforschung mit Anforderungen an eine moderne Präventionsarbeit im Kontext einer Schul- und Bildungsreform. Die systematische und kompakte Darstellung mit Übersichten und Wiederholungsfragen machen den Band zu einer unverzichtbaren Orientierungs- und Arbeitshilfe}], 533 ={a=Online edition}, 5831 ={a=Verfilmt, c=2011, z=Datenkonversion}, 650 4=[{a=(DE-601)104364106 / (DE-588)4076308-0 / Prävention / gbv}, {a=(DE-601)104409630 / (DE-588)4053474-1 / Schule / gbv}, {a=(DE-601)105502421 / (DE-588)4157237-3 / Gewalttätigkeit / gbv}, {a=(DE-601)126506744 / (DE-588)4314977-7 / Mobbing / gbv}], 688 =[{a=Bullying in schools / Germany / Psychological aspects}, {a=School discipline}, {a=School social work / Germany}, {a=School violence / Germany / Prevention}], 78000={t=[Früher u.d.T., Schubarth, Wilfried, Gewaltprävention in Schule und Jugendhilfe]}, 8564 ={u=http://www.content-select.com/index.php?id=bib_view&ean=9783170228702}, 912 ={a=ZDB-84-KOM}, 962 ={e=ZDB-84-KOM}, MBD ={M=49HBZ_WUP, i=990014607840206447, n=Bergische Universität Wuppertal}, MNG ={f=ILS, i=marc21, k=01, e=false, g=001460784-WUP01, j=00, a=import, b=2021-04-06 10:14:53 Europe/Berlin}, POR ={x=System, a=53152804690006447, M=49HBZ_WUP, c=static, y=2021-04-07 11:17:49 Europe/Berlin, w=2021-04-07 11:17:43 Europe/Berlin, D=https://eu04.alma.exlibrisgroup.com/view/uresolver/49HBZ_NETWORK/openurl?u.ignore_date_coverage=true&portfolio_pid=53152804690006447&Force_direct=true, v=P2E_JOB, z=2021-04-07 09:17:43, i=false, S=52152804700006447, d=https://hbz-network.userservices.exlibrisgroup.com/view/uresolver/49HBZ_NETWORK/openurl?u.ignore_date_coverage=true&rft.mms_id=990014607840206447, b=Available, f=DOCUMENT, e=http://www.content-select.com/index.php?id=bib_view%26ean=9783170228702, 8=53152804690006447}, @context=http://lobid.org/resources/context.jsonld, almaMmsId=990014607840206447, id=http://lobid.org/resources/990014607840206447#!, @isbn[]=[9783170205291], isbn[]=[9783170205291, 3170205293], @title=Gewalt und Mobbing an Schulen, title=Gewalt und Mobbing an Schulen, alternativeTitle[]=[], otherTitleInformation[]=[Möglichkeiten der Prävention und Intervention], edition[]=[], publication[]=[{test=, publicationHistory=2011, startDate=2011, type[]=[PublicationEvent], location[]=[Stuttgart], publishedBy[]=[Kohlhammer]}], describedBy={id=http://lobid.org/resources/990014607840206447, label=Webseite der hbz-Ressource 990014607840206447, type[]=[BibliographicDescription], dateCreated=20210406, inDataset={id=http://lobid.org/resources/dataset#!, label=lobid-resources – Der hbz-Verbundkatalog als Linked Open Data}, resultOf={type[]=[CreateAction], endTime=2022-09-16T15:52:42, instrument={id=https://github.com/hbz/lobid-resources, type[]=[SoftwareApplication], label=Software lobid-resources}, object={id=https://lobid.org/hbz01/990014607840206447, type[]=[DataFeedItem], label=hbz-Ressource 990014607840206447 im Exportformat MARC21 XML, inDataset={id=https://datahub.io/dataset/hbz_unioncatalog, label=hbz_unioncatalog}}}, license[]=[{id=http://creativecommons.org/publicdomain/zero/1.0, label=Creative Commons-Lizenz CC0 1.0 Universal}]}, @createTime=1, sameAs[]=[], supplement[]=[], isPartOf[]=[], containedIn[]=[], primaryForm[]=[], tableOfContents[]=[], description[]=[], seeAlso[]=[], fulltextOnline[]=[], related[]=[], inCollection[]=[{id=http://lobid.org/organisations/ZDB-84-KOM#!, label=eResource package, type[]=[Collection]}], similar[]=[], predecessor[]=[{test=, label=Früher u.d.T.}]} 2022-09-16 15:52:42 WARN [Thread-7] (Metafix.java:421) - Error while executing Fix expression (at file:~/lobid-resources-alma-fix/src/main/resources/alma/fix/relatedRessourcesAndLinks.fix, line 475): replace_all("predecessor[].*.label","Vorg. ---> ","") java.lang.IllegalArgumentException: Can't find: 2 in: null at org.metafacture.metafix.FixPath.insertInto(FixPath.java:297) at org.metafacture.metafix.FixPath.insertInto(FixPath.java:263) at org.metafacture.metafix.FixPath.lambda$null$20(FixPath.java:292) at org.metafacture.metafix.Value$TypeMatcher.match(Value.java:349) at org.metafacture.metafix.Value$TypeMatcher.ifArray(Value.java:322) at org.metafacture.metafix.FixPath.lambda$insertInto$22(FixPath.java:292) at org.metafacture.metafix.Value.extractType(Value.java:212) at org.metafacture.metafix.FixPath.insertInto(FixPath.java:291) at org.metafacture.metafix.FixPath.insertInto(FixPath.java:281) at org.metafacture.metafix.Record.lambda$transform$2(Record.java:212) at org.metafacture.metafix.Value.asList(Value.java:190) at org.metafacture.metafix.Value.asList(Value.java:184)
f)
java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469) at java.util.HashMap$KeyIterator.next(HashMap.java:1493) at org.metafacture.commons.tries.WildcardTrie.get(WildcardTrie.java:96) at org.metafacture.commons.tries.SimpleRegexTrie.get(SimpleRegexTrie.java:75) at org.metafacture.metafix.Value$Hash.findFields(Value.java:806)
I assume the exception in case b) was java.lang.IllegalStateException
and in case e) it was java.lang.IllegalArgumentException
, right?
Yep, right!
Case e) seems to be a data/Fix issue, maybe you can show the full error message (including the current record).
Everything else boils down to Value$Hash.findFields()
, which is also supported by the analysis that the only static
mutable, non-thread-safe objects are used there (PREFIX_CACHE
= HashMap
, TRIE_CACHE
= HashMap
, TRIE
= SimpleRegexTrie
). So my hypothesis would be that it's sufficient to make this one method synchronized
(individually thread-safe data structures are not sufficient as these collections depend on each other, they have to be collectively thread-safe).
[ETA: Maybe it's not sufficient to synchronize on the Hash
, but on the on the class by making the method Metafix
instance instead! Orstatic
?]
for (final String term : ALTERNATION_PATTERN.split(pattern)) {
- findFields(term, fieldSet);
+ findFields(term, fieldSet, map);
}
return fieldSet;
}
- private void findFields(final String pattern, final Set<String> fieldSet) {
+ private static synchronized void findFields(final String pattern, final Set<String> fieldSet, final Map<String, Value> map) {
Now I would definitely like to create tests for these thread-safety issues but I haven't gotten anywhere yet.[^1] Maybe you could just do some manual testing in order to verify the fix. We should also bear in mind that this method is absolutely on the critical path (cf. #207) so we definitely have to keep an eye on how it affects performance (single-threaded and multi-threaded).
[^1]: Some pointers for thread-safety testing: jcstress, ConcurrentUnit, MultithreadedTC, vmlens
Benchmark results for static synchronized
method:
Benchmark | (fixDef) | (input) | Score @ c8ba171 | Score @ static | Units | Boost |
---|---|---|---|---|---|---|
Baseline | N/A | N/A | 3031.762 ± 4.847 | 3031.196 ± 4.872 | ops/us | -0.02% |
FixParse | nothing | N/A | 34.818 ± 0.728 | 34.655 ± 0.906 | ops/s | -0.47% |
FixParse | alma | N/A | 25.210 ± 0.681 | 25.225 ± 0.732 | ops/s | +0.06% |
Metafix | nothing | empty | 2263.035 ± 60.509 | 2244.932 ± 19.264 | ops/s | -0.80% |
Metafix | nothing | alma-small | 38.668 ± 1.142 | 37.977 ± 1.920 | ops/s | -1.79% |
Metafix | alma | empty | 767.576 ± 10.739 | 764.330 ± 1.974 | ops/s | -0.42% |
Metafix | alma | alma-small | 4.042 ± 0.046 | 3.981 ± 0.037 | ops/s | -1.51% |
SlowMetafix | nothing | alma-large | 23.015 ± 0.994 | 23.037 ± 0.804 | ops/min | +0.10% |
SlowMetafix | alma | alma-large | 2.474 ± 0.036 | 2.462 ± 0.032 | ops/min | -0.49% |
Benchmark | (fixDef) | (input) | Score @ c8ba171 | Score @ static+synchronized | Units | Boost |
---|---|---|---|---|---|---|
Baseline | N/A | N/A | 3031.762 ± 4.847 | 3031.383 ± 1.551 | ops/us | -0.01% |
FixParse | nothing | N/A | 34.818 ± 0.728 | 34.834 ± 0.871 | ops/s | +0.05% |
FixParse | alma | N/A | 25.210 ± 0.681 | 25.480 ± 0.713 | ops/s | +1.07% |
Metafix | nothing | empty | 2263.035 ± 60.509 | 2256.579 ± 26.081 | ops/s | -0.29% |
Metafix | nothing | alma-small | 38.668 ± 1.142 | 38.282 ± 0.188 | ops/s | -1.00% |
Metafix | alma | empty | 767.576 ± 10.739 | 782.327 ± 8.896 | ops/s | +1.92% |
Metafix | alma | alma-small | 4.042 ± 0.046 | 3.956 ± 0.042 | ops/s | -2.13% |
SlowMetafix | nothing | alma-large | 23.015 ± 0.994 | 22.901 ± 0.322 | ops/min | -0.50% |
SlowMetafix | alma | alma-large | 2.474 ± 0.036 | 2.440 ± 0.059 | ops/min | -1.37% |
Benchmark results for configurable synchronization (intrinsic lock):
Benchmark | (fixDef) | (input) | Score @ c8ba171 | Score @ lock-disabled | Units | Boost |
---|---|---|---|---|---|---|
Baseline | N/A | N/A | 3031.762 ± 4.847 | 3031.914 ± 3.001 | ops/us | +0.01% |
FixParse | nothing | N/A | 34.818 ± 0.728 | 34.599 ± 0.754 | ops/s | -0.63% |
FixParse | alma | N/A | 25.210 ± 0.681 | 25.316 ± 0.653 | ops/s | +0.42% |
Metafix | nothing | empty | 2263.035 ± 60.509 | 2261.131 ± 28.884 | ops/s | -0.08% |
Metafix | nothing | alma-small | 38.668 ± 1.142 | 40.038 ± 0.282 | ops/s | +3.54% |
Metafix | alma | empty | 767.576 ± 10.739 | 781.751 ± 1.895 | ops/s | +1.85% |
Metafix | alma | alma-small | 4.042 ± 0.046 | 3.976 ± 0.284 | ops/s | -1.63% |
SlowMetafix | nothing | alma-large | 23.015 ± 0.994 | 22.976 ± 0.860 | ops/min | -0.17% |
SlowMetafix | alma | alma-large | 2.474 ± 0.036 | 2.490 ± 0.012 | ops/min | +0.65% |
Benchmark | (fixDef) | (input) | Score @ c8ba171 | Score @ lock-enabled | Units | Boost |
---|---|---|---|---|---|---|
Baseline | N/A | N/A | 3031.762 ± 4.847 | 3030.142 ± 7.040 | ops/us | -0.05% |
FixParse | nothing | N/A | 34.818 ± 0.728 | 34.218 ± 0.774 | ops/s | -1.72% |
FixParse | alma | N/A | 25.210 ± 0.681 | 25.353 ± 0.840 | ops/s | +0.57% |
Metafix | nothing | empty | 2263.035 ± 60.509 | 2227.026 ± 15.194 | ops/s | -1.59% |
Metafix | nothing | alma-small | 38.668 ± 1.142 | 38.913 ± 0.441 | ops/s | +0.63% |
Metafix | alma | empty | 767.576 ± 10.739 | 756.706 ± 15.609 | ops/s | -1.42% |
Metafix | alma | alma-small | 4.042 ± 0.046 | 4.063 ± 0.079 | ops/s | +0.52% |
SlowMetafix | nothing | alma-large | 23.015 ± 0.994 | 23.112 ± 0.149 | ops/min | +0.42% |
SlowMetafix | alma | alma-large | 2.474 ± 0.036 | 2.435 ± 0.007 | ops/min | -1.58% |
Updated Case e). Benchmarking looks good I think. Going to test with 20M records. Can you open a branch for convenient testing?
Can you open a branch for convenient testing?
For which implementation? They're both pretty straight forward, though, and it depends on which one you want to test.
Would want to test both. Concerning the configurable one: I opt for "threadsafe" as default and setting to "unsafe" optionally.
Pushed both branches.
I opt for "threadsafe" as default and setting to "unsafe" optionally.
Sure, we can decide it later. I've stuck to the simpler implementation for now.
Updated Case e).
That's indeed an unrelated issue. label
is an array, so replace_all()
fails (see 98bd5cd).
~~Tried to test 255-makeFindFieldsOptionallyThreadSafe
but forgot to set the option :P .
The outcome, however, resulted in zero errors. Also remarkable: all 24M records needed only 4h (!) ( executed with 6 threads on a 12 core server with near to zero load (the latter may explain the speed). Java options were -Xmx8G -Xms2G
).
Now, it's hard to reproduce unsafe thread exceptions as we all know. But I wonder if https://github.com/metafacture/metafacture-fix/pull/259 has resolved these btw resp. made these so much more unlikely to happen that they didn't happen (this time).
I am going to test your branch with option set to see the performance impact. We would then run the ETL some time without that option and see if at some point thread induced exceptions appear and heal them with the option set (if the performance test gives reasonable results).~~
Dump my last comment: configuration was wrong, it used the morph :(
Limetrans results for writing approx. 2m records to local file:
Metafix | Runtime | Boost |
---|---|---|
master (98bd5cd) |
4h59m38s | |
255-makeFindFieldsThreadSafe (78b985b) |
5h01m10s | +0.51% |
255-makeFindFieldsOptionallyThreadSafe (ad7d84e) |
4h58m42s | -0.31% |
255-makeFindFieldsOptionallyThreadSafe w/ -Dorg.metafacture.metafix.threadSafe=true |
5h04m42s | +1.69% |
Did the test with 24M docs. Results are good in the way as no more errors occurred, but disappointing as the needed time rose from ~9h to ~25h. Double checked that. I used https://github.com/metafacture/metafacture-fix/commit/ad7d84e9d0c886fedc55a311cb18634d4d69a19a.
Results are good in the way as no more errors occurred,
Great!
but disappointing as the needed time rose from ~9h to ~25h.
That's Morph vs. Fix, right? That was to be expected, wasn't it? It's not even as bad as I had anticipated ;)
That's Morph vs. Fix, right?
no, it's both using Fix. (Morph needs only 4.5h)
no, it's both using Fix.
Ah, okay. Must be thread contention then. Maybe reducing the number of threads helps? (You're probably not going to get it down to the unsafe runtime, though.)
Funny that you propose decreasing of threads while I would assume that increasing them would gain us speed. (The execution is done using 6 Threads (while it's an 12-Core CPU where we could safely use max 11 threads). Gonna test both scenarios.
(You're probably not going to get it down to the unsafe runtime,
sure, but an increase of factor 2.5 feels just bad. Also, as I have stated in https://github.com/metafacture/metafacture-fix/pull/257#issuecomment-1249233831, we may just want to go with thread-unsafe and take the loss of 100 docs or so? Ok, this maybe even feels bader ;) (and it may result in corrupted docs (I am not sure if it just bails out or corruption takes place. If it ignores the concurrent modificated docs we could just reindex these, but am I right that some of the errors indicates that corruption takes place?)
Funny that you propose decreasing of threads while I would assume that increasing them would gain us speed.
Well, if the threads are fighting over a common resource, I would assume that increasing the number of contestants can only make it worse. But you're right, data over speculation ;)
we may just want to go with thread-unsafe and take the loss of 100 docs or so? Ok, this maybe even feels bader ;)
That would be an absolute no-go for us at least. Sure, some hundred(s) of documents out of tens of millions might never be noticed by users, but I think we should aim for completeness as much as possible for the union catalogue (especially as it would even be non-deterministic).
I am not sure if it just bails out or corruption takes place. If it ignores the concurrent modificated docs we could just reindex these, but am I right that some of the errors indicates that corruption takes place?
Whether documents with exceptions are ignored depends on your strictnessHandlesProcessExceptions
and strictness
settings. With your current setup, only the particular Fix expression is skipped. So there might be a field missing or it might not be properly formatted or something like that.
But we don't know if all error cases lead to exceptions, do we? Can there be silent corruption with unsynchronized accesses?
(Can you compare your safe and unsafe indexing results to determine the potential types and extent of corruption?)
Can there be silent corruption with unsynchronized accesses
thinking this over: definitely yes (my proposal was a stupid idea ) => will not compare safe and unsafe indexing
Gonna test both scenarios.
Tests always in thread-safe mode. Values approximate. Also, theses values may vary from time to time (and I don't know why. Will eventually have more data and provide ranges instead of hard numbers):
3 threads: 100k/20m/thread => 26h/24M 6 threads: 100k/30m/thread => 20h/24M 11 threads: dumped that, CPU usage is stuck around 600%
Interestingly: 11 threads keeps the CPU usage stuck around 600% (instead of the optimal 1100%). So there is definitely a limit where threads are making sense here (in the context of this server and our fix). I.e., it is as you said, at some point adding threads only increases the overhead without using allocated CPU cycles.
Checking also atm if using https://github.com/metafacture/metafacture-fix/commit/78b985be2e3597a71001b6769a2310719994b87f makes a difference.
Maybe it makes more sense to use gnu parallel to split the BGZF and start several single threaded JVMs. (I read somewhere that starting new JVMs from within a JVM is only possible for java >=10)
An alternative approach would be to move the problematic parts into the Metafix
instance, then we wouldn't need any synchronization. But it would require extensive changes throughout the code.
78b985be2e3597a71001b6769a2310719994b87f needed 27h for the complete ETL (7 threads) :( . Gonna try again https://github.com/metafacture/metafacture-fix/commit/ad7d84e9d0c886fedc55a311cb18634d4d69a19a , which needed 25h as stated in https://github.com/metafacture/metafacture-fix/issues/255#issuecomment-1271326838, but will this time use 7 threads instead of 6. (in general speaking, both thread-safe approaches seem not to make a difference).
Another alternative might be to utilise thread-local variables (see 662a3b9).
Your ThreadLocal
approach seems very promising, as it uses 100% CPU /Thread! (just discarded the (redundant, not promising looking) test mentioned in https://github.com/metafacture/metafacture-fix/issues/255#issuecomment-1273097215).
First 100k needed only: 7 threads: 100k/20m/thread => ~12h/24M Gonna testing using 10 threads. Results tomorrow.
Benchmark results for thread-local variables:
Benchmark | (fixDef) | (input) | Score @ c8ba171 | Score @ 662a3b9 | Units | Boost |
---|---|---|---|---|---|---|
Baseline | N/A | N/A | 3031.762 ± 4.847 | 3031.059 ± 4.275 | ops/us | -0.02% |
FixParse | nothing | N/A | 34.818 ± 0.728 | 34.787 ± 0.829 | ops/s | -0.09% |
FixParse | alma | N/A | 25.210 ± 0.681 | 25.381 ± 0.591 | ops/s | +0.68% |
Metafix | nothing | empty | 2263.035 ± 60.509 | 2263.331 ± 57.177 | ops/s | +0.01% |
Metafix | nothing | alma-small | 38.668 ± 1.142 | 37.573 ± 0.949 | ops/s | -2.83% |
Metafix | alma | empty | 767.576 ± 10.739 | 785.372 ± 15.244 | ops/s | +2.32% |
Metafix | alma | alma-small | 4.042 ± 0.046 | 4.039 ± 0.037 | ops/s | -0.07% |
SlowMetafix | nothing | alma-large | 23.015 ± 0.994 | 22.666 ± 1.245 | ops/min | -1.52% |
SlowMetafix | alma | alma-large | 2.474 ± 0.036 | 2.452 ± 0.061 | ops/min | -0.89% |
ThreadLocal
completes ETL 24M Alma Fix with 10 threads in 6h10m :+1:
Suggestion: open PR 255-makeFindFieldsThreadSafeWithThreadLocal
and merge into master. Close this issue.
Excellent :) Let me just verify that there's no regression in Limetrans.
I've opened #263 which you can now review; if everything works out I'll merge it tomorrow. Okay?
I've opened https://github.com/metafacture/metafacture-fix/pull/263 which you can now review; if everything works out I'll merge it tomorrow. Okay?
ahm. Did read this too late. Already merged. I reopen this issue so we can react properly if you find a regression.
Limetrans results for writing approx. 2m records to local file:
Metafix | Runtime | Boost |
---|---|---|
master (ce9157f) |
4h36m49s | |
255-makeFindFieldsThreadSafeWithThreadLocal (0f2a0df) |
4h25m05s | -4.24% |
Came up in https://github.com/hbz/lobid-resources/issues/1409#issuecomment-1235382923. This one is not reproducible and seems to be the result of Metafix not being thread safe:
The threads are created using Metafacture's flowcontrol.ObjectPipeDecoupler.