swri-robotics / bag-database

A server that catalogs bag files and provides a web-based UI for accessing them.
Other
342 stars 71 forks source link

scanning very slow #170

Open ptulpen opened 2 years ago

ptulpen commented 2 years ago

Hello, I have the issue that the scanning is very slow, now running over a week for around 3000 files Is there a way to restart and/or troubleshoot the scanning progress? (restarting the container and even the server does not help) Version is 3.4.2

danthony06 commented 2 years ago

There should be logs in /usr/local/tomcat/logs, but I'm not sure if they will have information useful for this problem. Was it previously running fast, or is this a new installation?

ptulpen commented 2 years ago

Hello, before it was not so slow. The only interesting part in the logs are snippets like:

19-Jan-2022 20:29:41.728 INFO [MessageBroker-4] org.springframework.web.socket.config.WebSocketMessageBrokerStats.lambda$initLoggingTask$0 WebSocketSession[0 current WS(0)-HttpStream(0)-HttpPoll(0), 5 total, 0 closed abnormally (0 connect failure, 0 send limit, 0 transport error)], stompSubProtocol[processed CONNECT(0)-CONNECTED(0)-DISCONNECT(0)], stompBrokerRelay[null], inboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15], outboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 5], sockJsScheduler[pool size = 8, active threads = 1, queued tasks = 4, completed tasks = 275440]

ptulpen commented 1 year ago

One thought that came to my mind: Could setting indexes or similar options in the PostgreSQL help to improve the performance?

pjreed commented 1 year ago

The database should already have quite a few indexes on relevant columns; you could check using a tool like DBeaver to connect to the database, although if your tables are missing indexes, I would expect that to drastically slow down searching, not scanning new files. In fact, lacking indexes would actually make inserting new records faster since it can simply append them to the table without updating the indexes.

Scanning should mostly be limited by disk read speed, since it has to read in the entire file, and to a lesser degree by CPU speed, since it has to generate a hash to identify the bag file. This could be an issue if you're reading very large bags over a slow network connection, or potentially if you're reading large bag files from slow HDDs, especially if the bag files themselves are unindexed and there's a lot of disk thrashing going on.

danthony06 commented 1 year ago

Are your bags large by any chance, like more than 2GB?

On Wed, Aug 24, 2022, 4:01 PM P. J. Reed @.***> wrote:

The database should already have quite a few indexes on relevant columns; you could check using a tool like DBeaver to connect to the database, although if your tables are missing indexes, I would expect that to drastically slow down searching, not scanning new files. In fact, lacking indexes would actually make inserting new records faster since it can simply append them to the table without updating the indexes.

Scanning should mostly be limited by disk read speed, since it has to read in the entire file, and to a lesser degree by CPU speed, since it has to generate a hash to identify the bag file. This could be an issue if you're reading very large bags over a slow network connection, or potentially if you're reading large bag files from slow HDDs, especially if the bag files themselves are unindexed and there's a lot of disk thrashing going on.

— Reply to this email directly, view it on GitHub https://github.com/swri-robotics/bag-database/issues/170#issuecomment-1226337269, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBZWHDLKSD74EUXGOQZXF3V22ETRANCNFSM5MJZDFSQ . You are receiving this because you commented.Message ID: @.***>

ptulpen commented 1 year ago

Hello, @pjreed: indexes look fine and what you say about indexes sounds reasonable, so probably nothing to do with that @danthony06: yes, they are mostly cut at 4.1 GB Is there a limit ? or something to optimize?

danthony06 commented 1 year ago

@ptulpen There's not limit to my knowledge. I mostly wanted to make sure you weren't uploading 100GB bag files that might be causing network issues.

pjreed commented 1 year ago

How much free RAM does your server have? Is it possible that it's hitting swap space while trying to read the bags?

ptulpen commented 1 year ago

I have 32 GB ram(and 8 cpu) and it is not fully used I also increased the java memory limit with -e CATALINA_OPTS=" -Xmx10g"

ptulpen commented 1 year ago

I also see in the scanning process some interesting errors like these:

2022-09-06 14:53:24.289 [pool-2-thread-1] ERROR c.g.s.b.s.f.FilesystemBagStorageImpl - Unexpected error updating bag file:
java.lang.IllegalArgumentException: Chunk [**description**] is not a valid entry
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:219)
        at com.google.common.base.Splitter$MapSplitter.split(Splitter.java:526)
        at com.github.swrirobotics.bags.BagService.lambda$getMetadata$6(BagService.java:1307)
        at com.github.swrirobotics.bags.reader.BagFile.forMessagesOnTopic(BagFile.java:395)
        at com.github.swrirobotics.bags.BagService.getMetadata(BagService.java:1303)
        at com.github.swrirobotics.bags.BagService.extractTagsFromBagFile(BagService.java:1352)
        at com.github.swrirobotics.bags.BagService.addTagsToBag(BagService.java:1564)
        at com.github.swrirobotics.bags.BagService.insertNewBag(BagService.java:1499)
        at com.github.swrirobotics.bags.BagService.updateBagInDatabase(BagService.java:1761)
        at com.github.swrirobotics.bags.BagService.updateBagFile(BagService.java:1689)
        at com.github.swrirobotics.bags.storage.filesystem.FilesystemBagStorageImpl.lambda$updateBags$3(FilesystemBagStorageImpl.java:159)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at com.github.swrirobotics.bags.storage.filesystem.FilesystemBagStorageImpl.updateBags(FilesystemBagStorageImpl.java:146)
        at com.github.swrirobotics.bags.storage.filesystem.FilesystemBagStorageImpl$$FastClassBySpringCGLIB$$17031e11.invoke(<generated>)
        at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:783)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753)
        at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123)
        at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:388)
        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753)
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:698)
        at com.github.swrirobotics.bags.storage.filesystem.FilesystemBagStorageImpl$$EnhancerBySpringCGLIB$$953167bd.updateBags(<generated>)
        at com.github.swrirobotics.bags.storage.BagScanner.lambda$scanStorage$0(BagScanner.java:371)
        at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140)
        at com.github.swrirobotics.bags.storage.BagScanner.lambda$scanStorage$2(BagScanner.java:374)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

Description is part of the forders structure, which, but not even a complete folder name. The name would be of the pattern YYYY-MM-DD_description
So are there any "forbidden" characters or patterns?

pjreed commented 1 year ago

That's interesting, that definitely isn't a normal error...

That exception looks like it's being thrown from code that is trying to parse tags in the bag file. If you've configured metadata topics, then it expects every message on those topics to have a string field named data, and each one of those should be a newline-separated set of key:value pairs; for example:

name: John Doe
email: jdoe@example.com

I suspect you've got a bag file with metadata that is formatted in a way it doesn't expect; do you have an example of anything in your files that might be formatted differently from that?

pjreed commented 1 year ago

I've submitted a PR at #196 that will make it handle invalid metadata more gracefully when scanning bag files. I don't know if that will fix the speed issue you're having, but it may fix some other issues people have seen with it failing to recognize certain bag files...

ptulpen commented 1 year ago

Hello, yes you are completely right rostopic gave me %time,field.data 1234567894591808795,mydescription

now I rebuild it to %time,field.data 1234567894591808795,description: mydescription

With a small subset I tested it and if looks much faster Now I rewrite upload scripts and a "repair" script

The more graceful metadata scanning sounds also good, issues like that could happen in other scenarios as well I tried to test that as well, but I could not build it. can you provide the branch also as container ? (this is how I run the current system)

pjreed commented 1 year ago

Sure, I've pushed a image containing my build to ghcr.io/hatchbed/bag-database:v3.5.1-SNAPSHOT. Give that a try and see if it works for you.

danthony06 commented 1 year ago

v3.5.1 has been released with this fix.

ptulpen commented 1 year ago

The patch regarding the metadata is great. But a larger set of files showed, that is still takes a long time (for 875 files it took a week) What also looks interesting is, that when I add new files and start a scan, a lot of files gets scanned and only when the scan is finished, they appear all at once in on the website Is this an intended behaviour?

EDIT: also it appears to be on the database when everything is done (at least according to grepping through pqdump ) Maybe this is also connected with: https://github.com/swri-robotics/bag-database/issues/195 Blind guess would be that there is some kind of lock and caching

ptulpen commented 1 year ago

another thought regarding this: we saw that we have many images and quite big videos inside the bags can we maybe skip the analysis/extraction of that during the scan and focus on text based stuff?

Timple commented 10 months ago

For us analysis is CPU bound. It seems to be single-threaded which is a waste of the 16 threads available :slightly_frowning_face: