Closed elky87 closed 7 years ago
Hi @elky87 , @akshaysonvane and I talked over this issue earlier today. He added a test to verify loading a TRiG file that is about 275M. It takes a while to run, but it does pass. I know that the server-side parsing of TriG (how do those capitals go I wonder?) it probably very memory-intensive, as it has to read the whole file in order to parse (it can't stream it in). I'd probably look for workarounds such as using nquads.
We do want to understand the limitations of ingestion though, so the story on this issue is not over.
We will try to reproduce -- it look like it's related to your available memory, not to file size limitations. But it could also be that this limitation is a bug, so we'll investigate further.
If you want I can share with you the file that causes the issue on our side. Maybe the file has some internals that cause the memory consumption to blow up. To whom should I send the file, to @grechaw or @akshaysonvane ?
(According to the standard its TriG, but I have no idea what it stands for :D triples graph maybe? who knows )
If you put it on swc's wiki, I think @akshaysonvane can get access and download it. Perhaps it is already there.
I put it on the wiki. You find it at the page "rdf4j integration" and the file is linked under "causing file"
Thanks @elky87
From internal comment --
asonvane Akshay Sonvane added a comment - 10 minutes ago
Tested with the new file (~171MB) gets ingested without any hiccups. The problem must be with the available system memory. Will run the test in a VM with less memory.
Hi @elky87 , we're still not able to reproduce this particular issue. I'm not sure whether something has been fixed in the versions we're working with (doesn't seem likely). What version of the server are you using here?
We are using version 8.0-6.4 Enterprise Edition
Thanks, we'll check. I think that the memory profile may have improved in the versions we've been using.
Hi @elky87 , @akshaysonvane was able to reproduce with 8.0-6.4. So a fix I made for 9.0-2 and 8.0-7 happens to fix this issue as well. 8.0-7 is in the stabilization phase and will ship within the next couple of weeks.
I'll leave this issue open until you can confirm. Note I think that you'll get better performance from n-quads than from trig.
Hi @grechaw and @akshaysonvane thanks for thorough testing to reproduce this issue, we plan to upgrade to 9.0 soon. After we've done that I will test this issue again.
Hi, took us quite some time to make the switch to 9.0 but I just tested it and it works now. Thanks for the effort!
Hi, when I try to add a larger trig file (163MB) to a marklogic repository I get an exception that the marklogic server cannot parse the file.
the method I use:
connection.add(myLargeTrigFile, null, RDFFormat.TRIG);
Exception: ERROR] MarkLogicClientImpl Local message: failed to apply resource at graphs: Bad Request. Server Message: RESTAPI-INVALIDCONTENT: (err:FOER0000) Invalid content: XDMP-DOCUNEXPECTED1.0-mlxdmp:turtle($in, map:get($om, "passthru")) -- memory exhausted at :4091:167falsexdmp:turtle($in, map:get($om, "passthru"))memory exhausted4091167/MarkLogic/semantics.xqy4091167sem:rdf-parse(document{text{" <https://test-remotestore.semantic-web.at/meshdoublesize..."}}, "trig")indocument{text{" <https://test-remotestore.semantic-web.at/meshdoublesize..."}}options"trig"ommap:map(<map:map xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" .../>)fmt"trig"repair()1.0-ml/MarkLogic/rest-api/models/semantics-model.xqy46813semmod:extract-triples-from-body(map:map(<map:map xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" .../>), (), document{text{" <https://test-remotestore.semantic-web.at/meshdoublesize..."}})headersmap:map(<map:map xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" .../>)repair()bodydocument{text{" <https://test-remotestore.semantic-web.at/meshdoublesize..."}}content-type"application/trig"options()1.0-ml/MarkLogic/rest-api/models/semantics-model.xqy73154semmod:graph-insert(map:map(<map:map xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" .../>), map:map(<map:map xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" .../>), document{text{" <https://test-remotestore.semantic-web.at/meshdoublesize..."}}, eput:config-callback#2, fn:true())headersmap:map(<map:map xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" .../>)paramsmap:map(<map:map xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" .../>)bodydocument{text{" <https://test-remotestore.semantic-web.at/meshdoublesize..."}}callbackeput:config-callback#2append-permissionsfn:true()graph()_()content-type"application/trig"_()category()param-permissions()_()role-names()role-ids()request-permissions()repair()putative-result"CONTENT_UPDATED"old-permissions()update-perms()1.0-ml/MarkLogic/rest-api/models/semantics-model.xqy6234semmod:graph-insert(map:map(<map:map xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" .../>), map:map(<map:map xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" .../>), document{text{" <https://test-remotestore.semantic-web.at/meshdoublesize..."}}, eput:config-callback#2)1.0-ml/MarkLogic/rest-api/endpoints/graphstore-update.xqy4491.0-ml 2017-08-02 10:58:05 [FATAL] SnapshotService Restore of snapshot 'Marklogic_-_MeSH_double_size~1DF1729E-F1B0-0001-5F47-14601A306B20~20170801184949324~system' FAILED! Rolling back transaction. org.eclipse.rdf4j.rio.RDFParseException: Request to MarkLogic server failed, check file and format. at com.marklogic.semantics.sesame.client.MarkLogicClientImpl.performAdd(MarkLogicClientImpl.java:295) at com.marklogic.semantics.sesame.client.MarkLogicClient.sendAdd(MarkLogicClient.java:303) at com.marklogic.semantics.sesame.MarkLogicRepositoryConnection.add(MarkLogicRepositoryConnection.java:959)
From the exception it looks like the server runs maybe out of memory? Because of " -- memory exhausted at :4091:167"
But I also read here: https://docs.marklogic.com/guide/ingestion/formats#id_33599 That there are some file size limitations to txt files, xml files etc.
Are there any limitations of filesize regarding this method? Or is this maybe just some configuration issue on our side?