platonai / exotic-amazon

A complete solution to crawl amazon at scale completely and accurately.
143 stars 46 forks source link

unable to read task file #24

Open swlcyx opened 1 year ago

swlcyx commented 1 year ago

java.nio.file.FileSystemNotFoundException: null at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getFileSystem(ZipFileSystemProvider.java:169) at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getPath(ZipFileSystemProvider.java:155) at java.base/java.nio.file.Path.of(Path.java:208) at java.base/java.nio.file.Paths.get(Paths.java:97) at ai.platon.exotic.amazon.crawl.boot.component.AmazonGenerator.getPeriodicalSeedDirectories(AmazonGenerator.kt:61) at ai.platon.exotic.amazon.crawl.boot.component.AmazonGenerator.generateLoadingTasks(AmazonGenerator.kt:111) at ai.platon.exotic.amazon.crawl.boot.component.AmazonGenerator.generateStartupTasks(AmazonGenerator.kt:85) at ai.platon.exotic.amazon.crawl.boot.component.AmazonCrawler.generate(AmazonCrawler.kt:53) at ai.platon.scent.crawl.AbstractRunnableCrawler.run0(AbstractRunnableCrawler.kt:49) at ai.platon.scent.crawl.AbstractRunnableCrawler.run$suspendImpl(AbstractRunnableCrawler.kt:29) at ai.platon.scent.crawl.AbstractRunnableCrawler.run(AbstractRunnableCrawler.kt) at ai.platon.scent.crawl.AbstractRunnableStreamingCrawler.run$suspendImpl(AbstractRunnableStreamingCrawler.kt:24) at ai.platon.scent.crawl.AbstractRunnableStreamingCrawler.run(AbstractRunnableStreamingCrawler.kt) at ai.platon.scent.crawl.AbstractRunnableCrawler$run$1$1.invokeSuspend(AbstractRunnableCrawler.kt:22) at ai.platon.scent.crawl.AbstractRunnableCrawler$run$1$1.invoke(AbstractRunnableCrawler.kt) at ai.platon.scent.crawl.AbstractRunnableCrawler$run$1$1.invoke(AbstractRunnableCrawler.kt) at kotlinx.coroutines.intrinsics.UndispatchedKt.startUndispatchedOrReturn(Undispatched.kt:89) at kotlinx.coroutines.SupervisorKt.supervisorScope(Supervisor.kt:61) at ai.platon.scent.crawl.AbstractRunnableCrawler$run$1.invokeSuspend(AbstractRunnableCrawler.kt:21) at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106) at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)

platonai commented 1 year ago

Running with the standalone jar is not supported currently.

Please try running the program as described in the README.

swlcyx commented 1 year ago

After the environment is configured according to the steps, the jar is generated to run,in ubuntu18.04

platonai commented 1 year ago

That's great!

swlcyx commented 1 year ago

/usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -javaagent:/home/coisin/Downloads/idea-IU-231.8109.175/lib/idea_rt.jar=46755:/home/coisin/Downloads/idea-IU-231.8109.175/bin -Dfile.encoding=UTF-8 -classpath /home/coisin/Desktop/exotic-amazon-main/exotic-amazon-main/target/classes:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-all/1.10.11/pulsar-all-1.10.11.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-persist/1.10.11/pulsar-persist-1.10.11.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-common/1.10.11/pulsar-common-1.10.11.jar:/home/coisin/.m2/repository/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar:/home/coisin/.m2/repository/org/apache/httpcomponents/httpclient/4.5.13/httpclient-4.5.13.jar:/home/coisin/.m2/repository/org/apache/httpcomponents/httpcore/4.4.15/httpcore-4.4.15.jar:/home/coisin/.m2/repository/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar:/home/coisin/.m2/repository/com/ibm/icu/icu4j/4.0.1/icu4j-4.0.1.jar:/home/coisin/.m2/repository/commons-io/commons-io/2.11.0/commons-io-2.11.0.jar:/home/coisin/.m2/repository/org/apache/commons/commons-lang3/3.12.0/commons-lang3-3.12.0.jar:/home/coisin/.m2/repository/org/apache/commons/commons-math3/3.3/commons-math3-3.3.jar:/home/coisin/.m2/repository/org/apache/commons/commons-collections4/4.4/commons-collections4-4.4.jar:/home/coisin/.m2/repository/org/codehaus/woodstox/stax2-api/4.2.1/stax2-api-4.2.1.jar:/home/coisin/.m2/repository/com/fasterxml/woodstox/woodstox-core/6.4.0/woodstox-core-6.4.0.jar:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-serialization/1.5.32/kotlin-serialization-1.5.32.jar:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-gradle-plugin-api/1.5.32/kotlin-gradle-plugin-api-1.5.32.jar:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-native-utils/1.5.32/kotlin-native-utils-1.5.32.jar:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-util-io/1.5.32/kotlin-util-io-1.5.32.jar:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-project-model/1.5.32/kotlin-project-model-1.5.32.jar:/home/coisin/.m2/repository/org/nibor/autolink/autolink/0.10.0/autolink-0.10.0.jar:/home/coisin/.m2/repository/ai/platon/pulsar/gora-shaded-mongodb/0.8/gora-shaded-mongodb-0.8.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-jsoup/1.14.3/pulsar-jsoup-1.14.3.jar:/home/coisin/.m2/repository/commons-codec/commons-codec/1.15/commons-codec-1.15.jar:/home/coisin/.m2/repository/org/apache/avro/avro/1.8.1/avro-1.8.1.jar:/home/coisin/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar:/home/coisin/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/home/coisin/.m2/repository/com/thoughtworks/paranamer/paranamer/2.7/paranamer-2.7.jar:/home/coisin/.m2/repository/org/xerial/snappy/snappy-java/1.1.1.3/snappy-java-1.1.1.3.jar:/home/coisin/.m2/repository/org/tukaani/xz/1.5/xz-1.5.jar:/home/coisin/.m2/repository/org/slf4j/slf4j-api/1.7.36/slf4j-api-1.7.36.jar:/home/coisin/.m2/repository/org/apache/gora/gora-core/0.8/gora-core-0.8.jar:/home/coisin/.m2/repository/org/apache/gora/gora-compiler/0.8/gora-compiler-0.8.jar:/home/coisin/.m2/repository/org/apache/avro/avro-compiler/1.8.1/avro-compiler-1.8.1.jar:/home/coisin/.m2/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/home/coisin/.m2/repository/org/apache/velocity/velocity/1.7/velocity-1.7.jar:/home/coisin/.m2/repository/joda-time/joda-time/2.7/joda-time-2.7.jar:/home/coisin/.m2/repository/com/google/code/gson/gson/2.8.9/gson-2.8.9.jar:/home/coisin/.m2/repository/org/jgrapht/jgrapht-core/1.0.0/jgrapht-core-1.0.0.jar:/home/coisin/.m2/repository/org/jgrapht/jgrapht-ext/1.0.0/jgrapht-ext-1.0.0.jar:/home/coisin/.m2/repository/org/tinyjee/jgraphx/jgraphx/2.0.0.1/jgraphx-2.0.0.1.jar:/home/coisin/.m2/repository/jgraph/jgraph/5.13.0.0/jgraph-5.13.0.0.jar:/home/coisin/.m2/repository/org/antlr/antlr4-runtime/4.5.3/antlr4-runtime-4.5.3.jar:/home/coisin/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar:/home/coisin/.m2/repository/org/springframework/spring-context/5.3.19/spring-context-5.3.19.jar:/home/coisin/.m2/repository/org/springframework/spring-aop/5.3.19/spring-aop-5.3.19.jar:/home/coisin/.m2/repository/org/springframework/spring-beans/5.3.19/spring-beans-5.3.19.jar:/home/coisin/.m2/repository/org/springframework/spring-expression/5.3.19/spring-expression-5.3.19.jar:/home/coisin/.m2/repository/javax/xml/bind/jaxb-api/2.3.1/jaxb-api-2.3.1.jar:/home/coisin/.m2/repository/javax/activation/javax.activation-api/1.2.0/javax.activation-api-1.2.0.jar:/home/coisin/.m2/repository/commons-collections/commons-collections/3.2.2/commons-collections-3.2.2.jar:/home/coisin/.m2/repository/org/apache/hadoop/hadoop-common/2.7.2/hadoop-common-2.7.2.jar:/home/coisin/.m2/repository/com/fasterxml/jackson/dataformat/jackson-dataformat-properties/2.12.6/jackson-dataformat-properties-2.12.6.jar:/home/coisin/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.12.6.1/jackson-databind-2.12.6.1.jar:/home/coisin/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.12.6/jackson-core-2.12.6.jar:/home/coisin/.m2/repository/com/fasterxml/jackson/datatype/jackson-datatype-jsr310/2.12.6/jackson-datatype-jsr310-2.12.6.jar:/home/coisin/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.12.6/jackson-annotations-2.12.6.jar:/home/coisin/.m2/repository/com/fasterxml/jackson/module/jackson-module-kotlin/2.12.6/jackson-module-kotlin-2.12.6.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-beans/1.10.11/pulsar-beans-1.10.11.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-protocol/1.10.11/pulsar-protocol-1.10.11.jar:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-compiler/1.7.10/kotlin-compiler-1.7.10.jar:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-script-runtime/1.7.10/kotlin-script-runtime-1.7.10.jar:/home/coisin/.m2/repository/org/jetbrains/intellij/deps/trove4j/1.0.20181211/trove4j-1.0.20181211.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-filter/1.10.11/pulsar-filter-1.10.11.jar:/home/coisin/.m2/repository/dk/brics/automaton/automaton/1.11-8/automaton-1.11-8.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-scoring/1.10.11/pulsar-scoring-1.10.11.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-parse/1.10.11/pulsar-parse-1.10.11.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-index/1.10.11/pulsar-index-1.10.11.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-schedule/1.10.11/pulsar-schedule-1.10.11.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-resources/1.10.11/pulsar-resources-1.10.11.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-ql/1.10.11/pulsar-ql-1.10.11.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-ql-common/1.10.11/pulsar-ql-common-1.10.11.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-h2/1.4.196/pulsar-h2-1.4.196.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-dom/1.10.11/pulsar-dom-1.10.11.jar:/home/coisin/.m2/repository/com/udojava/EvalEx/2.0/EvalEx-2.0.jar:/home/coisin/.m2/repository/org/perf4j/perf4j/0.9.16/perf4j-0.9.16.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-boilerpipe/1.10.11/pulsar-boilerpipe-1.10.11.jar:/home/coisin/.m2/repository/net/sourceforge/nekohtml/nekohtml/1.9.22/nekohtml-1.9.22.jar:/home/coisin/.m2/repository/xerces/xercesImpl/2.12.2/xercesImpl-2.12.2.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-skeleton/1.10.11/pulsar-skeleton-1.10.11.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-browser/1.10.11/pulsar-browser-1.10.11.jar:/home/coisin/.m2/repository/io/dropwizard/metrics/metrics-core/4.1.31/metrics-core-4.1.31.jar:/home/coisin/.m2/repository/javax/websocket/javax.websocket-api/1.1/javax.websocket-api-1.1.jar:/home/coisin/.m2/repository/org/glassfish/tyrus/tyrus-container-grizzly-client/1.13.1/tyrus-container-grizzly-client-1.13.1.jar:/home/coisin/.m2/repository/org/glassfish/grizzly/grizzly-framework/2.3.25/grizzly-framework-2.3.25.jar:/home/coisin/.m2/repository/org/glassfish/grizzly/grizzly-http-server/2.3.25/grizzly-http-server-2.3.25.jar:/home/coisin/.m2/repository/org/glassfish/grizzly/grizzly-http/2.3.25/grizzly-http-2.3.25.jar:/home/coisin/.m2/repository/org/glassfish/tyrus/tyrus-client/1.13.1/tyrus-client-1.13.1.jar:/home/coisin/.m2/repository/org/glassfish/tyrus/tyrus-core/1.13.1/tyrus-core-1.13.1.jar:/home/coisin/.m2/repository/org/glassfish/tyrus/tyrus-spi/1.13.1/tyrus-spi-1.13.1.jar:/home/coisin/.m2/repository/com/github/kklisura/cdt/cdt-java-client/4.0.0/cdt-java-client-4.0.0.jar:/home/coisin/.m2/repository/org/javassist/javassist/3.24.1-GA/javassist-3.24.1-GA.jar:/home/coisin/.m2/repository/com/google/code/crawler-commons/crawler-commons/0.5/crawler-commons-0.5.jar:/home/coisin/.m2/repository/org/apache/tika/tika-core/2.3.0/tika-core-2.3.0.jar:/home/coisin/.m2/repository/com/beust/jcommander/1.81/jcommander-1.81.jar:/home/coisin/.m2/repository/com/github/oshi/oshi-core/5.6.1/oshi-core-5.6.1.jar:/home/coisin/.m2/repository/net/java/dev/jna/jna/5.8.0/jna-5.8.0.jar:/home/coisin/.m2/repository/net/java/dev/jna/jna-platform/5.8.0/jna-platform-5.8.0.jar:/home/coisin/.m2/repository/io/dropwizard/metrics/metrics-graphite/4.1.31/metrics-graphite-4.1.31.jar:/home/coisin/.m2/repository/com/rabbitmq/amqp-client/5.12.0/amqp-client-5.12.0.jar:/home/coisin/.m2/repository/org/jetbrains/kotlinx/kotlinx-coroutines-jdk8/1.5.2/kotlinx-coroutines-jdk8-1.5.2.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-apple-module/2.3.0/tika-parser-apple-module-2.3.0.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-zip-commons/2.3.0/tika-parser-zip-commons-2.3.0.jar:/home/coisin/.m2/repository/org/apache/commons/commons-compress/1.21/commons-compress-1.21.jar:/home/coisin/.m2/repository/com/googlecode/plist/dd-plist/1.23/dd-plist-1.23.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-audiovideo-module/2.3.0/tika-parser-audiovideo-module-2.3.0.jar:/home/coisin/.m2/repository/com/drewnoakes/metadata-extractor/2.16.0/metadata-extractor-2.16.0.jar:/home/coisin/.m2/repository/com/adobe/xmp/xmpcore/6.1.11/xmpcore-6.1.11.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-cad-module/2.3.0/tika-parser-cad-module-2.3.0.jar:/home/coisin/.m2/repository/org/apache/poi/poi/5.2.0/poi-5.2.0.jar:/home/coisin/.m2/repository/com/zaxxer/SparseBitSet/1.2/SparseBitSet-1.2.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-code-module/2.3.0/tika-parser-code-module-2.3.0.jar:/home/coisin/.m2/repository/org/codelibs/jhighlight/1.0.3/jhighlight-1.0.3.jar:/home/coisin/.m2/repository/org/ccil/cowan/tagsoup/tagsoup/1.2.1/tagsoup-1.2.1.jar:/home/coisin/.m2/repository/org/ow2/asm/asm/9.2/asm-9.2.jar:/home/coisin/.m2/repository/com/epam/parso/2.0.14/parso-2.0.14.jar:/home/coisin/.m2/repository/org/tallison/jmatio/1.5/jmatio-1.5.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-crypto-module/2.3.0/tika-parser-crypto-module-2.3.0.jar:/home/coisin/.m2/repository/org/bouncycastle/bcmail-jdk15on/1.70/bcmail-jdk15on-1.70.jar:/home/coisin/.m2/repository/org/bouncycastle/bcutil-jdk15on/1.70/bcutil-jdk15on-1.70.jar:/home/coisin/.m2/repository/org/bouncycastle/bcpkix-jdk15on/1.70/bcpkix-jdk15on-1.70.jar:/home/coisin/.m2/repository/org/bouncycastle/bcprov-jdk15on/1.70/bcprov-jdk15on-1.70.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-digest-commons/2.3.0/tika-parser-digest-commons-2.3.0.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-font-module/2.3.0/tika-parser-font-module-2.3.0.jar:/home/coisin/.m2/repository/org/apache/pdfbox/fontbox/2.0.25/fontbox-2.0.25.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-html-module/2.3.0/tika-parser-html-module-2.3.0.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-html-commons/2.3.0/tika-parser-html-commons-2.3.0.jar:/home/coisin/.m2/repository/de/l3s/boilerpipe/boilerpipe/1.1.0/boilerpipe-1.1.0.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-image-module/2.3.0/tika-parser-image-module-2.3.0.jar:/home/coisin/.m2/repository/com/github/jai-imageio/jai-imageio-core/1.4.0/jai-imageio-core-1.4.0.jar:/home/coisin/.m2/repository/org/apache/pdfbox/jbig2-imageio/3.0.3/jbig2-imageio-3.0.3.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-mail-module/2.3.0/tika-parser-mail-module-2.3.0.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-mail-commons/2.3.0/tika-parser-mail-commons-2.3.0.jar:/home/coisin/.m2/repository/org/apache/james/apache-mime4j-core/0.8.4/apache-mime4j-core-0.8.4.jar:/home/coisin/.m2/repository/org/apache/james/apache-mime4j-dom/0.8.4/apache-mime4j-dom-0.8.4.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-microsoft-module/2.3.0/tika-parser-microsoft-module-2.3.0.jar:/home/coisin/.m2/repository/com/pff/java-libpst/0.9.3/java-libpst-0.9.3.jar:/home/coisin/.m2/repository/org/apache/poi/poi-scratchpad/5.2.0/poi-scratchpad-5.2.0.jar:/home/coisin/.m2/repository/org/apache/poi/poi-ooxml/5.2.0/poi-ooxml-5.2.0.jar:/home/coisin/.m2/repository/org/apache/poi/poi-ooxml-lite/5.2.0/poi-ooxml-lite-5.2.0.jar:/home/coisin/.m2/repository/org/apache/xmlbeans/xmlbeans/5.0.3/xmlbeans-5.0.3.jar:/home/coisin/.m2/repository/com/github/virtuald/curvesapi/1.06/curvesapi-1.06.jar:/home/coisin/.m2/repository/com/healthmarketscience/jackcess/jackcess/4.0.1/jackcess-4.0.1.jar:/home/coisin/.m2/repository/com/healthmarketscience/jackcess/jackcess-encrypt/4.0.1/jackcess-encrypt-4.0.1.jar:/home/coisin/.m2/repository/org/slf4j/jcl-over-slf4j/1.7.36/jcl-over-slf4j-1.7.36.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-miscoffice-module/2.3.0/tika-parser-miscoffice-module-2.3.0.jar:/home/coisin/.m2/repository/org/glassfish/jaxb/jaxb-runtime/2.3.6/jaxb-runtime-2.3.6.jar:/home/coisin/.m2/repository/jakarta/xml/bind/jakarta.xml.bind-api/2.3.3/jakarta.xml.bind-api-2.3.3.jar:/home/coisin/.m2/repository/org/glassfish/jaxb/txw2/2.3.6/txw2-2.3.6.jar:/home/coisin/.m2/repository/com/sun/istack/istack-commons-runtime/3.0.12/istack-commons-runtime-3.0.12.jar:/home/coisin/.m2/repository/com/sun/activation/jakarta.activation/1.2.2/jakarta.activation-1.2.2.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-news-module/2.3.0/tika-parser-news-module-2.3.0.jar:/home/coisin/.m2/repository/com/rometools/rome/1.18.0/rome-1.18.0.jar:/home/coisin/.m2/repository/com/rometools/rome-utils/1.18.0/rome-utils-1.18.0.jar:/home/coisin/.m2/repository/org/jdom/jdom2/2.0.6.1/jdom2-2.0.6.1.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-ocr-module/2.3.0/tika-parser-ocr-module-2.3.0.jar:/home/coisin/.m2/repository/org/apache/commons/commons-exec/1.3/commons-exec-1.3.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-pdf-module/2.3.0/tika-parser-pdf-module-2.3.0.jar:/home/coisin/.m2/repository/org/apache/pdfbox/pdfbox/2.0.25/pdfbox-2.0.25.jar:/home/coisin/.m2/repository/org/apache/pdfbox/pdfbox-tools/2.0.25/pdfbox-tools-2.0.25.jar:/home/coisin/.m2/repository/org/apache/pdfbox/pdfbox-debugger/2.0.25/pdfbox-debugger-2.0.25.jar:/home/coisin/.m2/repository/org/apache/pdfbox/jempbox/1.8.16/jempbox-1.8.16.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-pkg-module/2.3.0/tika-parser-pkg-module-2.3.0.jar:/home/coisin/.m2/repository/org/brotli/dec/0.1.2/dec-0.1.2.jar:/home/coisin/.m2/repository/com/github/junrar/junrar/7.4.1/junrar-7.4.1.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-text-module/2.3.0/tika-parser-text-module-2.3.0.jar:/home/coisin/.m2/repository/com/googlecode/juniversalchardet/juniversalchardet/1.0.3/juniversalchardet-1.0.3.jar:/home/coisin/.m2/repository/org/apache/commons/commons-csv/1.9.0/commons-csv-1.9.0.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-xml-module/2.3.0/tika-parser-xml-module-2.3.0.jar:/home/coisin/.m2/repository/org/apache/tika/tika-parser-xmp-commons/2.3.0/tika-parser-xmp-commons-2.3.0.jar:/home/coisin/.m2/repository/org/apache/pdfbox/xmpbox/2.0.25/xmpbox-2.0.25.jar:/home/coisin/.m2/repository/org/gagravarr/vorbis-java-tika/0.8/vorbis-java-tika-0.8.jar:/home/coisin/.m2/repository/org/gagravarr/vorbis-java-core/0.8/vorbis-java-core-0.8.jar:/home/coisin/.m2/repository/org/springframework/spring-core/5.3.19/spring-core-5.3.19.jar:/home/coisin/.m2/repository/org/springframework/spring-jcl/5.3.19/spring-jcl-5.3.19.jar:/home/coisin/.m2/repository/org/springframework/boot/spring-boot/2.5.13/spring-boot-2.5.13.jar:/home/coisin/.m2/repository/org/springframework/boot/spring-boot-starter/2.5.13/spring-boot-starter-2.5.13.jar:/home/coisin/.m2/repository/org/springframework/boot/spring-boot-autoconfigure/2.5.13/spring-boot-autoconfigure-2.5.13.jar:/home/coisin/.m2/repository/org/springframework/boot/spring-boot-starter-logging/2.5.13/spring-boot-starter-logging-2.5.13.jar:/home/coisin/.m2/repository/org/slf4j/jul-to-slf4j/1.7.36/jul-to-slf4j-1.7.36.jar:/home/coisin/.m2/repository/jakarta/annotation/jakarta.annotation-api/1.3.5/jakarta.annotation-api-1.3.5.jar:/home/coisin/.m2/repository/org/yaml/snakeyaml/1.28/snakeyaml-1.28.jar:/home/coisin/.m2/repository/com/google/guava/guava/30.1-jre/guava-30.1-jre.jar:/home/coisin/.m2/repository/com/google/guava/failureaccess/1.0.1/failureaccess-1.0.1.jar:/home/coisin/.m2/repository/com/google/guava/listenablefuture/9999.0-empty-to-avoid-conflict-with-guava/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:/home/coisin/.m2/repository/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0.jar:/home/coisin/.m2/repository/org/checkerframework/checker-qual/3.5.0/checker-qual-3.5.0.jar:/home/coisin/.m2/repository/com/google/errorprone/error_prone_annotations/2.3.4/error_prone_annotations-2.3.4.jar:/home/coisin/.m2/repository/com/google/j2objc/j2objc-annotations/1.3/j2objc-annotations-1.3.jar:/home/coisin/.m2/repository/org/apache/lucene/lucene-core/8.1.1/lucene-core-8.1.1.jar:/home/coisin/.m2/repository/org/apache/lucene/lucene-analyzers-common/8.1.1/lucene-analyzers-common-8.1.1.jar:/home/coisin/.m2/repository/org/apache/lucene/lucene-analyzers-smartcn/8.1.1/lucene-analyzers-smartcn-8.1.1.jar:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib/1.7.10/kotlin-stdlib-1.7.10.jar:/home/coisin/.m2/repository/org/jetbrains/annotations/13.0/annotations-13.0.jar:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib-common/1.7.10/kotlin-stdlib-common-1.7.10.jar:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib-jdk7/1.7.10/kotlin-stdlib-jdk7-1.7.10.jar:/home/coisin/.m2/repository/org/jetbrains/kotlinx/kotlinx-coroutines-core/1.5.2/kotlinx-coroutines-core-1.5.2.jar:/home/coisin/.m2/repository/org/jetbrains/kotlinx/kotlinx-coroutines-core-jvm/1.5.2/kotlinx-coroutines-core-jvm-1.5.2.jar:/home/coisin/.m2/repository/ai/platon/scent/scent-resources/1.10.4/scent-resources-1.10.4.jar:/home/coisin/.m2/repository/ai/platon/scent/scent-beans/1.10.4/scent-beans-1.10.4.jar:/home/coisin/.m2/repository/ai/platon/scent/scent-protocol/1.10.4/scent-protocol-1.10.4.jar:/home/coisin/.m2/repository/ai/platon/scent/scent-common/1.10.4/scent-common-1.10.4.jar:/home/coisin/.m2/repository/io/netty/netty-all/4.1.76.Final/netty-all-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-codec-dns/4.1.76.Final/netty-codec-dns-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-codec-haproxy/4.1.76.Final/netty-codec-haproxy-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-codec-http2/4.1.76.Final/netty-codec-http2-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-codec-memcache/4.1.76.Final/netty-codec-memcache-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-codec-mqtt/4.1.76.Final/netty-codec-mqtt-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-codec-redis/4.1.76.Final/netty-codec-redis-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-codec-smtp/4.1.76.Final/netty-codec-smtp-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-codec-stomp/4.1.76.Final/netty-codec-stomp-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-codec-xml/4.1.76.Final/netty-codec-xml-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-resolver-dns/4.1.76.Final/netty-resolver-dns-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-transport-rxtx/4.1.76.Final/netty-transport-rxtx-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-transport-sctp/4.1.76.Final/netty-transport-sctp-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-transport-udt/4.1.76.Final/netty-transport-udt-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-transport-classes-epoll/4.1.76.Final/netty-transport-classes-epoll-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-transport-native-unix-common/4.1.76.Final/netty-transport-native-unix-common-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-transport-classes-kqueue/4.1.76.Final/netty-transport-classes-kqueue-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-resolver-dns-classes-macos/4.1.76.Final/netty-resolver-dns-classes-macos-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-transport-native-epoll/4.1.76.Final/netty-transport-native-epoll-4.1.76.Final-linux-x86_64.jar:/home/coisin/.m2/repository/io/netty/netty-transport-native-epoll/4.1.76.Final/netty-transport-native-epoll-4.1.76.Final-linux-aarch_64.jar:/home/coisin/.m2/repository/io/netty/netty-transport-native-kqueue/4.1.76.Final/netty-transport-native-kqueue-4.1.76.Final-osx-x86_64.jar:/home/coisin/.m2/repository/io/netty/netty-transport-native-kqueue/4.1.76.Final/netty-transport-native-kqueue-4.1.76.Final-osx-aarch_64.jar:/home/coisin/.m2/repository/io/netty/netty-resolver-dns-native-macos/4.1.76.Final/netty-resolver-dns-native-macos-4.1.76.Final-osx-x86_64.jar:/home/coisin/.m2/repository/io/netty/netty-resolver-dns-native-macos/4.1.76.Final/netty-resolver-dns-native-macos-4.1.76.Final-osx-aarch_64.jar:/home/coisin/.m2/repository/ai/platon/scent/scent-extract/1.10.4/scent-extract-1.10.4.jar:/home/coisin/.m2/repository/ai/platon/scent/scent-build/1.10.4/scent-build-1.10.4.jar:/home/coisin/.m2/repository/ai/platon/scent/scent-proxy/1.10.4/scent-proxy-1.10.4.jar:/home/coisin/.m2/repository/io/netty/netty-buffer/4.1.76.Final/netty-buffer-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-common/4.1.76.Final/netty-common-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-codec/4.1.76.Final/netty-codec-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-transport/4.1.76.Final/netty-transport-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-codec-http/4.1.76.Final/netty-codec-http-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-handler/4.1.76.Final/netty-handler-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-resolver/4.1.76.Final/netty-resolver-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-handler-proxy/4.1.76.Final/netty-handler-proxy-4.1.76.Final.jar:/home/coisin/.m2/repository/io/netty/netty-codec-socks/4.1.76.Final/netty-codec-socks-4.1.76.Final.jar:/home/coisin/.m2/repository/ai/platon/scent/scent-engine/1.10.4/scent-engine-1.10.4.jar:/home/coisin/.m2/repository/ai/platon/scent/scent-core-common/1.10.4/scent-core-common-1.10.4.jar:/home/coisin/.m2/repository/ai/platon/scent/scent-dom/1.10.4/scent-dom-1.10.4.jar:/home/coisin/.m2/repository/org/apache/commons/commons-text/1.9/commons-text-1.9.jar:/home/coisin/.m2/repository/org/nield/kotlin-statistics/1.2.1/kotlin-statistics-1.2.1.jar:/home/coisin/.m2/repository/ai/platon/scent/scent-auto-mining/1.10.4/scent-auto-mining-1.10.4.jar:/home/coisin/.m2/repository/ai/platon/scent/scent-persist/1.10.4/scent-persist-1.10.4.jar:/home/coisin/.m2/repository/com/sun/xml/bind/jaxb-impl/4.0.1/jaxb-impl-4.0.1.jar:/home/coisin/.m2/repository/com/sun/xml/bind/jaxb-core/4.0.1/jaxb-core-4.0.1.jar:/home/coisin/.m2/repository/org/eclipse/angus/angus-activation/1.0.0/angus-activation-1.0.0.jar:/home/coisin/.m2/repository/jakarta/activation/jakarta.activation-api/1.2.2/jakarta.activation-api-1.2.2.jar:/home/coisin/.m2/repository/com/github/haifengl/smile-math/1.5.2/smile-math-1.5.2.jar:/home/coisin/.m2/repository/com/github/haifengl/smile-core/1.5.2/smile-core-1.5.2.jar:/home/coisin/.m2/repository/com/github/haifengl/smile-data/1.5.2/smile-data-1.5.2.jar:/home/coisin/.m2/repository/com/github/haifengl/smile-graph/1.5.2/smile-graph-1.5.2.jar:/home/coisin/.m2/repository/org/jpmml/pmml-agent/1.6.4/pmml-agent-1.6.4.jar:/home/coisin/.m2/repository/org/jpmml/pmml-model-jackson/1.6.4/pmml-model-jackson-1.6.4.jar:/home/coisin/.m2/repository/org/jpmml/pmml-model/1.6.4/pmml-model-1.6.4.jar:/home/coisin/.m2/repository/org/jpmml/pmml-evaluator-metro/1.6.4/pmml-evaluator-metro-1.6.4.jar:/home/coisin/.m2/repository/org/jpmml/pmml-evaluator/1.6.4/pmml-evaluator-1.6.4.jar:/home/coisin/.m2/repository/org/jpmml/pmml-model-metro/1.6.4/pmml-model-metro-1.6.4.jar:/home/coisin/.m2/repository/org/jpmml/pmml-evaluator-testing/1.6.4/pmml-evaluator-testing-1.6.4.jar:/home/coisin/.m2/repository/ai/platon/scent/scent-parse/1.10.4/scent-parse-1.10.4.jar:/home/coisin/.m2/repository/mysql/mysql-connector-java/8.0.28/mysql-connector-java-8.0.28.jar:/home/coisin/.m2/repository/ai/platon/scent/scent-boot/1.10.4/scent-boot-1.10.4.jar:/home/coisin/.m2/repository/org/springframework/boot/spring-boot-starter-data-mongodb/2.5.13/spring-boot-starter-data-mongodb-2.5.13.jar:/home/coisin/.m2/repository/org/mongodb/mongodb-driver-sync/4.2.3/mongodb-driver-sync-4.2.3.jar:/home/coisin/.m2/repository/org/mongodb/bson/4.2.3/bson-4.2.3.jar:/home/coisin/.m2/repository/org/mongodb/mongodb-driver-core/4.2.3/mongodb-driver-core-4.2.3.jar:/home/coisin/.m2/repository/org/springframework/data/spring-data-mongodb/3.2.11/spring-data-mongodb-3.2.11.jar:/home/coisin/.m2/repository/org/springframework/spring-tx/5.3.19/spring-tx-5.3.19.jar:/home/coisin/.m2/repository/org/springframework/data/spring-data-commons/2.5.11/spring-data-commons-2.5.11.jar:/home/coisin/.m2/repository/org/springframework/spring-web/5.3.19/spring-web-5.3.19.jar:/home/coisin/.m2/repository/ai/platon/scent/scent-rest/1.10.4/scent-rest-1.10.4.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-rest/1.10.9/pulsar-rest-1.10.9.jar:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-boot/1.10.9/pulsar-boot-1.10.9.jar:/home/coisin/.m2/repository/javax/persistence/javax.persistence-api/2.2/javax.persistence-api-2.2.jar:/home/coisin/.m2/repository/org/hibernate/hibernate-core/5.4.33/hibernate-core-5.4.33.jar:/home/coisin/.m2/repository/org/jboss/logging/jboss-logging/3.4.3.Final/jboss-logging-3.4.3.Final.jar:/home/coisin/.m2/repository/antlr/antlr/2.7.7/antlr-2.7.7.jar:/home/coisin/.m2/repository/org/jboss/spec/javax/transaction/jboss-transaction-api_1.2_spec/1.1.1.Final/jboss-transaction-api_1.2_spec-1.1.1.Final.jar:/home/coisin/.m2/repository/org/jboss/jandex/2.2.3.Final/jandex-2.2.3.Final.jar:/home/coisin/.m2/repository/com/fasterxml/classmate/1.5.1/classmate-1.5.1.jar:/home/coisin/.m2/repository/org/dom4j/dom4j/2.1.3/dom4j-2.1.3.jar:/home/coisin/.m2/repository/org/hibernate/common/hibernate-commons-annotations/5.1.2.Final/hibernate-commons-annotations-5.1.2.Final.jar:/home/coisin/.m2/repository/org/springframework/boot/spring-boot-starter-web/2.5.13/spring-boot-starter-web-2.5.13.jar:/home/coisin/.m2/repository/org/springframework/boot/spring-boot-starter-json/2.5.13/spring-boot-starter-json-2.5.13.jar:/home/coisin/.m2/repository/com/fasterxml/jackson/datatype/jackson-datatype-jdk8/2.12.6/jackson-datatype-jdk8-2.12.6.jar:/home/coisin/.m2/repository/com/fasterxml/jackson/module/jackson-module-parameter-names/2.12.6/jackson-module-parameter-names-2.12.6.jar:/home/coisin/.m2/repository/org/springframework/spring-webmvc/5.3.19/spring-webmvc-5.3.19.jar:/home/coisin/.m2/repository/org/springframework/boot/spring-boot-starter-tomcat/2.5.13/spring-boot-starter-tomcat-2.5.13.jar:/home/coisin/.m2/repository/org/apache/tomcat/embed/tomcat-embed-core/9.0.62/tomcat-embed-core-9.0.62.jar:/home/coisin/.m2/repository/org/apache/tomcat/embed/tomcat-embed-el/9.0.62/tomcat-embed-el-9.0.62.jar:/home/coisin/.m2/repository/org/apache/tomcat/embed/tomcat-embed-websocket/9.0.62/tomcat-embed-websocket-9.0.62.jar:/home/coisin/.m2/repository/io/dropwizard/metrics/metrics-json/4.1.31/metrics-json-4.1.31.jar:/home/coisin/.m2/repository/ai/platon/commons/distributed-lock-core/1.4.3/distributed-lock-core-1.4.3.jar:/home/coisin/.m2/repository/ai/platon/commons/distributed-lock-api/1.4.3/distributed-lock-api-1.4.3.jar:/home/coisin/.m2/repository/org/springframework/spring-aspects/5.3.19/spring-aspects-5.3.19.jar:/home/coisin/.m2/repository/org/springframework/boot/spring-boot-starter-aop/2.5.13/spring-boot-starter-aop-2.5.13.jar:/home/coisin/.m2/repository/org/aspectj/aspectjweaver/1.9.7/aspectjweaver-1.9.7.jar:/home/coisin/.m2/repository/org/springframework/retry/spring-retry/1.3.3/spring-retry-1.3.3.jar:/home/coisin/.m2/repository/ai/platon/commons/distributed-lock-mongo/1.4.3/distributed-lock-mongo-1.4.3.jar:/home/coisin/.m2/repository/com/aspose/aspose-cells/22.9/aspose-cells-22.9.jar:/home/coisin/.m2/repository/org/springframework/boot/spring-boot-devtools/2.5.13/spring-boot-devtools-2.5.13.jar:/home/coisin/.m2/repository/org/springframework/boot/spring-boot-starter-actuator/2.5.13/spring-boot-starter-actuator-2.5.13.jar:/home/coisin/.m2/repository/org/springframework/boot/spring-boot-actuator-autoconfigure/2.5.13/spring-boot-actuator-autoconfigure-2.5.13.jar:/home/coisin/.m2/repository/org/springframework/boot/spring-boot-actuator/2.5.13/spring-boot-actuator-2.5.13.jar:/home/coisin/.m2/repository/io/micrometer/micrometer-core/1.7.11/micrometer-core-1.7.11.jar:/home/coisin/.m2/repository/org/hdrhistogram/HdrHistogram/2.1.12/HdrHistogram-2.1.12.jar:/home/coisin/.m2/repository/org/latencyutils/LatencyUtils/2.0.3/LatencyUtils-2.0.3.jar:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib-jdk8/1.7.10/kotlin-stdlib-jdk8-1.7.10.jar:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-reflect/1.7.10/kotlin-reflect-1.7.10.jar:/home/coisin/.m2/repository/ch/qos/logback/logback-core/1.2.11/logback-core-1.2.11.jar:/home/coisin/.m2/repository/ch/qos/logback/logback-classic/1.2.11/logback-classic-1.2.11.jar:/home/coisin/.m2/repository/net/bytebuddy/byte-buddy/1.10.22/byte-buddy-1.10.22.jar ai.platon.exotic.amazon.starter.CrawlStarterKt 01:10:10,561 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml] 01:10:10,564 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [file:/home/coisin/Desktop/exotic-amazon-main/exotic-amazon-main/target/classes/logback.xml] 01:10:10,573 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs multiple times on the classpath. 01:10:10,573 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [file:/home/coisin/Desktop/exotic-amazon-main/exotic-amazon-main/target/classes/logback.xml] 01:10:10,573 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-resources/1.10.11/pulsar-resources-1.10.11.jar!/logback.xml] 01:10:10,573 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/coisin/.m2/repository/ai/platon/scent/scent-resources/1.10.4/scent-resources-1.10.4.jar!/logback.xml] 01:10:11,367 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set 01:10:11,381 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender] 01:10:11,386 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [Console] 01:10:11,557 |-WARN in ch.qos.logback.core.ConsoleAppender[Console] - This appender no longer admits a layout as a sub-component, set an encoder instead. 01:10:11,557 |-WARN in ch.qos.logback.core.ConsoleAppender[Console] - To ensure compatibility, wrapping your layout in LayoutWrappingEncoder. 01:10:11,557 |-WARN in ch.qos.logback.core.ConsoleAppender[Console] - See also http://logback.qos.ch/codes.html#layoutInsteadOfEncoder for details 01:10:11,559 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.rolling.RollingFileAppender] 01:10:11,571 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [DRFA] 01:10:11,581 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property 01:10:11,601 |-INFO in c.q.l.core.rolling.TimeBasedRollingPolicy@1021436681 - No compression will be used 01:10:11,603 |-INFO in c.q.l.core.rolling.TimeBasedRollingPolicy@1021436681 - Will use the pattern logs/pulsar.log.%d{.yyyy-MM-dd} for the active file 01:10:11,628 |-INFO in c.q.l.core.rolling.DefaultTimeBasedFileNamingAndTriggeringPolicy - The date pattern is '.yyyy-MM-dd' from file name pattern 'logs/pulsar.log.%d{.yyyy-MM-dd}'. 01:10:11,628 |-INFO in c.q.l.core.rolling.DefaultTimeBasedFileNamingAndTriggeringPolicy - Roll-over at midnight. 01:10:11,630 |-INFO in c.q.l.core.rolling.DefaultTimeBasedFileNamingAndTriggeringPolicy - Setting initial period to Tue Apr 04 01:09:27 PDT 2023 01:10:11,652 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[DRFA] - Active log file name: logs/pulsar.log 01:10:11,652 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[DRFA] - File property is set to [logs/pulsar.log] 01:10:11,654 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.FileAppender] 01:10:11,654 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [METRICS] 01:10:11,660 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property 01:10:11,661 |-INFO in ch.qos.logback.core.FileAppender[METRICS] - File property is set to [logs/pulsar.m.log] 01:10:11,662 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.FileAppender] 01:10:11,662 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [COUNTERS] 01:10:11,663 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property 01:10:11,665 |-INFO in ch.qos.logback.core.FileAppender[COUNTERS] - File property is set to [logs/pulsar.c.log] 01:10:11,665 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.FileAppender] 01:10:11,665 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [SQL] 01:10:11,665 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property 01:10:11,666 |-INFO in ch.qos.logback.core.FileAppender[SQL] - File property is set to [logs/pulsar.sql.log] 01:10:11,666 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.FileAppender] 01:10:11,666 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [JSON] 01:10:11,666 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property 01:10:11,671 |-INFO in ch.qos.logback.core.FileAppender[JSON] - File property is set to [logs/pulsar.json.log] 01:10:11,672 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.FileAppender] 01:10:11,672 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [COLLECT] 01:10:11,672 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property 01:10:11,675 |-INFO in ch.qos.logback.core.FileAppender[COLLECT] - File property is set to [logs/pulsar.dc.log] 01:10:11,675 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to INFO 01:10:11,675 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [DRFA] to Logger[ROOT] 01:10:11,676 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [Console] to Logger[ROOT] 01:10:11,677 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.exotic.amazon.crawl.boot.component.json] to INFO 01:10:11,677 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting additivity of logger [ai.platon.exotic.amazon.crawl.boot.component.json] to false 01:10:11,677 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [JSON] to Logger[ai.platon.exotic.amazon.crawl.boot.component.json] 01:10:11,677 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.scent] to INFO 01:10:11,677 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.scent.analysis] to DEBUG 01:10:11,684 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.scent.rest.api] to INFO 01:10:11,684 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.scent.proxy] to INFO 01:10:11,684 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.scent.crawl.parser] to INFO 01:10:11,684 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.scent.rest.api.schedule] to DEBUG 01:10:11,684 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.scent.rest.api.service.scrape] to INFO 01:10:11,684 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.scent.amazon] to INFO 01:10:11,685 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.scent.amazon.boot.schedule] to DEBUG 01:10:11,685 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.scent.examples] to INFO 01:10:11,685 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.common.metrics.AppMetrics] to INFO 01:10:11,685 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting additivity of logger [ai.platon.pulsar.common.metrics.AppMetrics] to false 01:10:11,685 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [METRICS] to Logger[ai.platon.pulsar.common.metrics.AppMetrics] 01:10:11,685 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.common.CounterReporter] to INFO 01:10:11,685 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting additivity of logger [ai.platon.pulsar.common.CounterReporter] to false 01:10:11,685 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [COUNTERS] to Logger[ai.platon.pulsar.common.CounterReporter] 01:10:11,685 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.ql.h2.utils.Jdbc] to INFO 01:10:11,685 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting additivity of logger [ai.platon.pulsar.ql.h2.utils.Jdbc] to false 01:10:11,685 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [SQL] to Logger[ai.platon.pulsar.ql.h2.utils.Jdbc] 01:10:11,685 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.common.collect] to INFO 01:10:11,685 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting additivity of logger [ai.platon.pulsar.common.collect] to false 01:10:11,685 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [COLLECT] to Logger[ai.platon.pulsar.common.collect] 01:10:11,686 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar] to INFO 01:10:11,686 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.ql] to INFO 01:10:11,686 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.net] to INFO 01:10:11,686 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.browser] to WARN 01:10:11,686 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.parse.html] to INFO 01:10:11,686 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.crawl.parse] to INFO 01:10:11,686 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.crawl] to INFO 01:10:11,686 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.crawl.protocol] to INFO 01:10:11,686 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.protocol.browser] to INFO 01:10:11,686 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.protocol.browser.driver] to INFO 01:10:11,686 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.crawl.component] to INFO 01:10:11,686 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.protocol.browser.emulator.BrowserEmulateEventHandler] to INFO 01:10:11,687 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.protocol.browser.emulator.context] to INFO 01:10:11,689 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [ai.platon.pulsar.common.distributed.lock] to INFO 01:10:11,689 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.springframework.core] to INFO 01:10:11,689 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.springframework.core.env] to INFO 01:10:11,689 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.springframework.orm.jpa] to INFO 01:10:11,689 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.springframework.transaction] to INFO 01:10:11,689 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.apache.hadoop.conf.Configuration] to WARN 01:10:11,690 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.apache.gora] to ERROR 01:10:11,690 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [org.apache.hadoop] to WARN 01:10:11,690 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration. 01:10:11,692 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@6aba2b86 - Registering current configuration as safe fallback point

. _ _ /\ / '_ () \ \ \ \ ( ( )\ | ' | '| | ' \/ ` | \ \ \ \ \/ _)| |)| | | | | || (| | ) ) ) ) ' |__| .|| ||| |\, | / / / / =========|_|==============|__/=//// :: Spring Boot :: (v2.5.13)

01:10:13.987 [tartedMain] INFO ai.platon.scent.ScentEnvironment - Registered features(69): top, left, width, height, char, txt_nd, img, a, sibling, child, dep, seq, txt_dns, pid, tag, nd_id, nd_cs, ft_sz, color, b_bolor, rtop, rleft, rrow, rcol, dist, simg, mimg, limg, aimg, saimg, maimg, laimg, char_max, char_ave, own_char, own_txt_nd, grant_child, descend, sep, rseq, txt_nd_c, vcc, vcv, avcc, avcv, hcc, hcv, ahcc, ahcv, txt_df, cap_df, tn_max_w, tn_ave_w, tn_max_h, tn_ave_h, a_max_w, a_ave_w, a_max_h, a_ave_h, img_max_w, img_ave_w, img_max_h, img_ave_h, tn_total_w, tn_total_h, a_total_w, a_total_h, img_total_w, img_total_h 01:10:14.002 [tartedMain] INFO ai.platon.scent.ScentEnvironment - Initializing feature calculator, use ChainedFeatureCalculator with Level2FeatureCalculator appended 01:10:14.113 [tartedMain] INFO ai.platon.scent.ScentEnvironment - Scent environment is initialized 01:10:14.449 [tartedMain] INFO a.p.e.a.c.boot.CrawlerInitializer - Initializing feature calculator, append AmazonFeatureCalculator 01:10:14.482 [tartedMain] INFO a.p.e.amazon.starter.CrawlStarterKt - Starting CrawlStarterKt using Java 11.0.18 on ubuntu with PID 11331 (/home/coisin/Desktop/exotic-amazon-main/exotic-amazon-main/target/classes started by coisin in /home/coisin/Desktop/exotic-amazon-main/exotic-amazon-main) 01:10:14.484 [tartedMain] INFO a.p.e.amazon.starter.CrawlStarterKt - The following 2 profiles are active: "rest", "crawler" 01:10:14.727 [tartedMain] INFO o.s.b.d.restart.ChangeableUrls - The Class-Path manifest attribute in /home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-compiler/1.7.10/kotlin-compiler-1.7.10.jar referenced one or more files that do not exist: file:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-compiler/1.7.10/annotations-13.0.jar,file:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-compiler/1.7.10/kotlin-stdlib.jar,file:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-compiler/1.7.10/kotlin-reflect.jar,file:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-compiler/1.7.10/kotlin-script-runtime.jar,file:/home/coisin/.m2/repository/org/jetbrains/kotlin/kotlin-compiler/1.7.10/trove4j.jar 01:10:14.728 [tartedMain] INFO o.s.b.d.restart.ChangeableUrls - The Class-Path manifest attribute in /home/coisin/.m2/repository/com/sun/xml/bind/jaxb-impl/4.0.1/jaxb-impl-4.0.1.jar referenced one or more files that do not exist: file:/home/coisin/.m2/repository/com/sun/xml/bind/jaxb-impl/4.0.1/jaxb-core.jar,file:/home/coisin/.m2/repository/com/sun/xml/bind/jaxb-impl/4.0.1/angus-activation.jar 01:10:14.728 [tartedMain] INFO o.s.b.d.restart.ChangeableUrls - The Class-Path manifest attribute in /home/coisin/.m2/repository/com/sun/xml/bind/jaxb-core/4.0.1/jaxb-core-4.0.1.jar referenced one or more files that do not exist: file:/home/coisin/.m2/repository/com/sun/xml/bind/jaxb-core/4.0.1/jakarta.activation-api.jar,file:/home/coisin/.m2/repository/com/sun/xml/bind/jaxb-core/4.0.1/jakarta.xml.bind-api.jar 01:10:14.729 [tartedMain] INFO o.s.b.d.e.DevToolsPropertyDefaultsPostProcessor - Devtools property defaults active! Set 'spring.devtools.add-properties' to 'false' to disable 01:10:14.729 [tartedMain] INFO o.s.b.d.e.DevToolsPropertyDefaultsPostProcessor - For additional web related logging consider setting the 'logging.level.web' property to 'DEBUG' 01:10:19.077 [tartedMain] INFO o.s.d.r.c.RepositoryConfigurationDelegate - Bootstrapping Spring Data MongoDB repositories in DEFAULT mode. 01:10:19.430 [tartedMain] INFO o.s.d.r.c.RepositoryConfigurationDelegate - Finished Spring Data repository scanning in 344 ms. Found 20 MongoDB repository interfaces. 01:10:21.352 [tartedMain] INFO o.s.c.s.PostProcessorRegistrationDelegate$BeanPostProcessorChecker - Bean 'ai.platon.commons.distributed.lock.configuration.DistributedLockConfiguration$Companion' of type [ai.platon.commons.distributed.lock.configuration.DistributedLockConfiguration$Companion] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying) 01:10:23.563 [tartedMain] INFO o.s.b.w.e.tomcat.TomcatWebServer - Tomcat initialized with port(s): 8182 (http) 01:10:23.630 [tartedMain] INFO o.a.coyote.http11.Http11NioProtocol - Initializing ProtocolHandler ["http-nio-8182"] 01:10:23.631 [tartedMain] INFO o.a.catalina.core.StandardService - Starting service [Tomcat] 01:10:23.631 [tartedMain] INFO o.a.catalina.core.StandardEngine - Starting Servlet engine: [Apache Tomcat/9.0.62] 01:10:24.107 [tartedMain] INFO o.a.c.c.C.[.[localhost].[/api] - Initializing Spring embedded WebApplicationContext 01:10:24.113 [tartedMain] INFO o.s.b.w.s.c.ServletWebServerApplicationContext - Root WebApplicationContext: initialization completed in 9383 ms 01:10:25.781 [tartedMain] INFO org.mongodb.driver.cluster - Cluster created with settings {hosts=[localhost:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms'} 01:10:26.229 [host:27017] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:2, serverValue:43}] to localhost:27017 01:10:26.241 [host:27017] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:1, serverValue:44}] to localhost:27017 01:10:26.251 [host:27017] INFO org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=localhost:27017, type=STANDALONE, state=CONNECTED, ok=true, minWireVersion=0, maxWireVersion=6, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=147607474} 01:10:26.495 [tartedMain] WARN o.s.data.convert.CustomConversions - Registering converter from class java.time.LocalDateTime to class org.joda.time.LocalDateTime as reading converter although it doesn't convert from a store-supported type! You might want to check your annotation setup at the converter implementation. 01:10:26.605 [tartedMain] WARN o.s.data.convert.CustomConversions - Registering converter from class java.time.LocalDateTime to class org.joda.time.LocalDateTime as reading converter although it doesn't convert from a store-supported type! You might want to check your annotation setup at the converter implementation. 01:10:27.346 [tartedMain] INFO o.s.b.d.a.OptionalLiveReloadServer - LiveReload server is running on port 35729 01:10:28.403 [tartedMain] INFO a.p.p.c.config.AbstractConfiguration - Find legacy resource: jar:file:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-skeleton/1.10.11/pulsar-skeleton-1.10.11.jar!/config/legacy/pulsar-default.xml 01:10:28.406 [tartedMain] INFO a.p.p.c.config.AbstractConfiguration - Find legacy resource: jar:file:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-skeleton/1.10.11/pulsar-skeleton-1.10.11.jar!/config/legacy/pulsar-site.xml 01:10:28.409 [tartedMain] INFO a.p.p.c.config.AbstractConfiguration - Find legacy resource: jar:file:/home/coisin/.m2/repository/ai/platon/scent/scent-resources/1.10.4/scent-resources-1.10.4.jar!/config/legacy/default/pulsar-task.xml 01:10:28.498 [tartedMain] INFO a.p.p.c.config.AbstractConfiguration - profile: | [pulsar-default.xml, pulsar-site.xml, pulsar-task.xml] 01:10:28.503 [tartedMain] INFO a.p.p.c.config.AbstractConfiguration - Find legacy resource: jar:file:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-skeleton/1.10.11/pulsar-skeleton-1.10.11.jar!/config/legacy/pulsar-default.xml 01:10:28.509 [tartedMain] INFO a.p.p.c.config.AbstractConfiguration - Find legacy resource: jar:file:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-skeleton/1.10.11/pulsar-skeleton-1.10.11.jar!/config/legacy/pulsar-site.xml 01:10:28.512 [tartedMain] INFO a.p.p.c.config.AbstractConfiguration - Find legacy resource: jar:file:/home/coisin/.m2/repository/ai/platon/scent/scent-resources/1.10.4/scent-resources-1.10.4.jar!/config/legacy/default/pulsar-task.xml 01:10:28.521 [tartedMain] INFO a.p.p.c.config.AbstractConfiguration - profile: | [pulsar-default.xml, pulsar-site.xml, pulsar-task.xml] 01:10:29.170 [tartedMain] INFO a.p.p.crawl.impl.StreamingCrawlLoop - Crawl loop is created | @1915152077 01:10:29.179 [tartedMain] INFO a.p.s.b.a.component.ScentCrawlLoop - Streaming crawl loop is created | ai.platon.scent.boot.autoconfigure.component.ScentCrawlLoop@1915152077 01:10:29.189 [tartedMain] INFO a.p.p.crawl.impl.StreamingCrawlLoop - Crawl loop is created | @689036982 01:10:29.417 [tartedMain] INFO ai.platon.pulsar.crawl.CoreMetrics - maxUrlLength: 1024 timeoutUrls: 0 failedUrls: 0 deadUrls: 0 01:10:29.979 [tartedMain] INFO ai.platon.scent.ScentEnvironment - Registered features(69): top, left, width, height, char, txt_nd, img, a, sibling, child, dep, seq, txt_dns, pid, tag, nd_id, nd_cs, ft_sz, color, b_bolor, rtop, rleft, rrow, rcol, dist, simg, mimg, limg, aimg, saimg, maimg, laimg, char_max, char_ave, own_char, own_txt_nd, grant_child, descend, sep, rseq, txt_nd_c, vcc, vcv, avcc, avcv, hcc, hcv, ahcc, ahcv, txt_df, cap_df, tn_max_w, tn_ave_w, tn_max_h, tn_ave_h, a_max_w, a_ave_w, a_max_h, a_ave_h, img_max_w, img_ave_w, img_max_h, img_ave_h, tn_total_w, tn_total_h, a_total_w, a_total_h, img_total_w, img_total_h 01:10:29.980 [tartedMain] INFO ai.platon.scent.ScentEnvironment - Initializing feature calculator, use ChainedFeatureCalculator with Level2FeatureCalculator appended 01:10:29.980 [tartedMain] INFO ai.platon.scent.ScentEnvironment - Scent environment is initialized 01:10:29.985 [tartedMain] INFO a.p.s.q.h.c.s.AbstractScentSQLContext - SQLContext is created | 1/0 | ScentH2Context 01:10:29.993 [tartedMain] INFO a.p.pulsar.context.PulsarContexts - Active context | ai.platon.scent.ql.h2.context.support.ScentH2Context#1 01:10:30.550 [tartedMain] INFO a.p.p.c.c.GracefulScheduledExecutor - Scheduled monitor is started | EnumCounterReporter 01:10:32.372 [tartedMain] INFO a.p.p.c.c.GracefulScheduledExecutor - Scheduled monitor is started | WebDriverPoolMonitor 01:10:32.384 [tartedMain] INFO a.p.p.c.c.GracefulScheduledExecutor - Scheduled monitor is started | BrowserMonitor 01:10:32.391 [tartedMain] INFO a.p.p.c.c.GracefulScheduledExecutor - Scheduled monitor is started | BrowserPrivacyContextMonitor 01:10:33.367 [tartedMain] INFO a.p.p.crawl.impl.StreamingCrawlLoop - Registered 15 link collectors | loop#1 @1915152077 01:10:33.443 [r-worker-1] INFO a.p.p.crawl.impl.StreamingCrawler - Starting StreamingCrawler #1 ... 01:10:33.782 [r-worker-1] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:3, serverValue:45}] to localhost:27017 01:10:34.395 [tartedMain] INFO a.p.e.a.c.b.CrawlerConfiguration$logger$1 - profile: | [pulsar-default.xml, pulsar-site.xml, pulsar-task.xml] 01:10:34.900 [tartedMain] INFO a.p.e.a.c.b.CrawlerConfiguration$logger$1 - Configuration: core-default.xml, core-site.xml, jar:file:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-skeleton/1.10.11/pulsar-skeleton-1.10.11.jar!/config/legacy/pulsar-default.xml, jar:file:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-skeleton/1.10.11/pulsar-skeleton-1.10.11.jar!/config/legacy/pulsar-site.xml, jar:file:/home/coisin/.m2/repository/ai/platon/scent/scent-resources/1.10.4/scent-resources-1.10.4.jar!/config/legacy/default/pulsar-task.xml 01:10:34.935 [tartedMain] INFO a.p.e.a.c.c.h.p.WebDataExtractorInstaller - Initializing extractors, create extractors from config file | sites/amazon/crawl/parse/extract-config.json 01:10:36.415 [tartedMain] INFO a.p.e.a.c.c.h.p.WebDataExtractorInstaller - Installed SQL extractors: id: 1, pid: 0, pattern: .+/dp/.+, sql: x-asin.sql, sink: asin_sync, children: 7 id: 8, pid: 1, pattern: .+/dp/.+, sql: x-asin-sims-consolidated-1.sql, sink: asin_ad_sync, children: 0 id: 9, pid: 1, pattern: .+/dp/.+, sql: x-asin-sims-consolidated-2.sql, sink: asin_ad_sync, children: 0 id: 10, pid: 1, pattern: .+/dp/.+, sql: x-asin-sims-consolidated-3.sql, sink: asin_ad_sync, children: 0 id: 11, pid: 1, pattern: .+/dp/.+, sql: x-asin-sims-consider.sql, sink: asin_ad_sync, children: 0 id: 12, pid: 1, pattern: .+/dp/.+, sql: x-similar-items.sql, sink: asin_ad_sync, children: 0 id: 13, pid: 1, pattern: .+/dp/.+, sql: x-asin-top-reviews.sql, sink: asin_review_sync, children: 0 id: 14, pid: 1, pattern: .+/dp/.+, sql: x-asin-buy-choice.sql, sink: asin_buy_choice_sync, children: 0 id: 2, pid: 0, pattern: .+/seller/.+, sql: x-sellers.sql, sink: seller_sync, children: 0 id: 3, pid: 0, pattern: .+/product-reviews/.+, sql: x-asin-reviews.sql, sink: asin_review_sync, children: 0 id: 4, pid: 0, pattern: .+/zgbs/.+, sql: x-asin-best-sellers.sql, sink: asin_best_sellers_sync, children: 0 id: 5, pid: 0, pattern: .+/new-releases/.+, sql: x-asin-new-releases.sql, sink: asin_new_releases_sync, children: 0 id: 6, pid: 0, pattern: .+/movers-and-shakers/.+, sql: x-asin-movers-and-shakers.sql, sink: asin_movers_and_shakers_sync, children: 0 id: 7, pid: 0, pattern: .+/most-wished-for/.+, sql: x-asin-most-wished-for.sql, sink: asin_most_wished_for_sync, children: 0 id: 15, pid: 0, pattern: .+/dp/.+, sql: x-asin-customer-hui.sql, sink: asin_customer_hui_sync, children: 0 id: 16, pid: 0, pattern: .+.fr/.+/bestsellers/.+, sql: x-asin-best-sellers.sql, sink: asin_best_sellers_sync, children: 0 id: 17, pid: 0, pattern: .+.de/.+/bestsellers/.+, sql: x-asin-best-sellers.sql, sink: asin_best_sellers_sync, children: 0

01:10:37.080 [r-worker-1] INFO a.p.s.r.a.service.v1.ScrapeServiceV1 - Recovered 0 unfinished tasks in PT0.165463S 01:10:37.080 [r-worker-1] INFO a.p.s.crawl.AbstractRunnableCrawler - Injecting seeds ... 01:10:37.081 [r-worker-1] INFO a.p.s.crawl.AbstractRunnableCrawler - Generating fetch urls ... 01:10:37.093 [r-worker-1] INFO a.p.e.a.c.b.c.AmazonGenerator - Generating startup tasks ... 01:10:40.757 [tartedMain] INFO o.s.b.a.e.web.EndpointLinksResolver - Exposing 1 endpoint(s) beneath base path '/actuator' 01:10:41.501 [tartedMain] INFO o.a.coyote.http11.Http11NioProtocol - Starting ProtocolHandler ["http-nio-8182"] 01:10:41.711 [tartedMain] INFO o.s.b.w.e.tomcat.TomcatWebServer - Tomcat started on port(s): 8182 (http) with context path '/api' 01:10:41.908 [tartedMain] INFO a.p.e.amazon.starter.CrawlStarterKt - Started CrawlStarterKt in 29.249 seconds (JVM running for 32.861) 01:10:59.979 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Loaded 31205 links | /home/coisin/Desktop/exotic-amazon-main/exotic-amazon-main/target/classes/sites/amazon/crawl/generate/periodical/pt24h/best-sellers.txt 01:11:02.370 [em-0] INFO a.p.p.p.b.d.WebDriverPoolManager - Maintaining service is started 01:11:05.009 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Loaded 22428 links | /home/coisin/Desktop/exotic-amazon-main/exotic-amazon-main/target/classes/sites/amazon/crawl/generate/periodical/pt24h/new-releases.txt 01:11:10.989 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Loaded 24424 links | /home/coisin/Desktop/exotic-amazon-main/exotic-amazon-main/target/classes/sites/amazon/crawl/generate/periodical/pt24h/most-wished-for.txt 01:11:11.015 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Loaded 39 links | /home/coisin/Desktop/exotic-amazon-main/exotic-amazon-main/target/classes/sites/amazon/crawl/generate/periodical/pt1h/movers-and-shakers.txt 01:11:11.022 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Collected tasks: BEST_SELLERS, NEW_RELEASES, MOST_WISHED_FOR, MOVERS_AND_SHAKERS 01:11:11.032 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Creating collector for BEST_SELLERS 01:11:11.652 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Checking 21951 links for task in database 01:11:11.653 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Checking 21951 links for task in database 01:11:12.252 [r-worker-2] INFO a.p.p.crawl.impl.StreamingCrawler - 1. numRunning: 0, availableMemory: 64.38 MiB, memoryToReserve: 1.00 GiB, shortage: -1006239744 B 01:11:26.956 [r-worker-2] INFO a.p.p.crawl.impl.StreamingCrawler - 21. numRunning: 0, availableMemory: 62.92 MiB, memoryToReserve: 1.00 GiB, shortage: -1007763456 B 01:11:37.193 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Generated 200/21750 BEST_SELLERS tasks with collector BEST_SELLERS in PT0S, with 0 ones removed(fetched) 01:11:41.918 [cheduler-1] INFO a.p.s.r.a.s.s.TaskMultiHistoryService - Deleted 0 records before 2023-04-02T08:11:41.899476Z 01:11:41.920 [cheduler-1] INFO a.p.s.r.a.s.s.TaskMonthlyHistoryService - Deleted 0 records before 2023-03-04T08:11:41.918632Z from monthly history in PT0.002253S 01:11:41.921 [cheduler-1] DEBUG a.p.s.r.a.schedule.ScentRestMonitor - Try executing top N tasks ... 01:11:42.632 [r-worker-2] INFO a.p.p.crawl.impl.StreamingCrawler - 41. numRunning: 0, availableMemory: 70.14 MiB, memoryToReserve: 1.00 GiB, shortage: -1000194048 B 01:11:52.196 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Creating collector for NEW_RELEASES 01:11:52.744 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Checking 16640 links for task in database 01:11:52.745 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Checking 16640 links for task in database 01:11:58.026 [r-worker-2] INFO a.p.p.crawl.impl.StreamingCrawler - 61. numRunning: 0, availableMemory: 67.51 MiB, memoryToReserve: 1.00 GiB, shortage: -1002954752 B 01:12:04.989 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Generated 200/16440 NEW_RELEASES tasks with collector NEW_RELEASES in PT0S, with 0 ones removed(fetched) 01:12:12.250 [r-worker-2] INFO a.p.p.crawl.impl.StreamingCrawler - 81. numRunning: 0, availableMemory: 82.42 MiB, memoryToReserve: 1.00 GiB, shortage: -987316224 B 01:12:19.990 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Creating collector for MOST_WISHED_FOR 01:12:20.417 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Checking 18020 links for task in database 01:12:20.418 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Checking 18020 links for task in database 01:12:26.792 [r-worker-2] INFO a.p.p.crawl.impl.StreamingCrawler - 101. numRunning: 0, availableMemory: 78.90 MiB, memoryToReserve: 1.00 GiB, shortage: -991006720 B 01:12:34.701 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Generated 200/17820 MOST_WISHED_FOR tasks with collector MOST_WISHED_FOR in PT0S, with 0 ones removed(fetched) 01:12:42.881 [r-worker-2] INFO a.p.p.crawl.impl.StreamingCrawler - 121. numRunning: 0, availableMemory: 97.64 MiB, memoryToReserve: 1.00 GiB, shortage: -971358208 B 01:12:44.235 [cheduler-1] INFO a.p.s.r.a.schedule.ScentRestMonitor - Total collected 1/0.01/1/0.01 in 2m31s, remaining 600/56610, collect time: 2023-04-04T08:11:12.003768Z -> 2023-04-04T08:11:12.027555Z 01:12:44.239 [cheduler-1] DEBUG a.p.s.r.a.schedule.ScentRestMonitor - Try executing top N tasks ... 01:12:49.704 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Creating collector for MOVERS_AND_SHAKERS 01:12:50.061 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Checking 39 links for task in database 01:12:50.062 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Checking 39 links for task in database 01:12:50.113 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Generated 39/0 MOVERS_AND_SHAKERS tasks with collector MOVERS_AND_SHAKERS in PT0S, with 0 ones removed(fetched) 01:12:57.114 [r-worker-2] INFO a.p.p.crawl.impl.StreamingCrawler - 141. numRunning: 0, availableMemory: 94.05 MiB, memoryToReserve: 1.00 GiB, shortage: -975118336 B 01:13:07.347 [r-worker-1] INFO a.p.e.a.c.b.component.AmazonCrawler - Registered collectors: name | priority | pName | collected | cd/s | collect | c/s | time | size | estSize | firstCollect | lastCollect | labels FCC#RealTime | -214748364 | HIGHEST | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.4 | -214748364 | HIGHEST | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | DelayCC#Delay | -5000 | HIGHER5 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.5 | -5000 | HIGHER5 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.6 | -4000 | HIGHER4 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.7 | -3000 | HIGHER3 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | MOVERS_AND_SHAKERS | -3000 | HIGHER3 | 0 | 0.00 | 0 | 0.00 | 0s | 39 | 39 | 31 16:00:00 | 31 16:00:00 | MOS FCC.8 | -2000 | HIGHER2 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | SCRAPE | -2000 | HIGHER2 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | SCE FCC.9 | -1000 | HIGHER | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.10 | 0 | NORMAL | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | BEST_SELLERS | 0 | NORMAL | 1 | 1.00 | 1 | 1.00 | 0s | 200 | 21950 | 04 01:11:12 | 04 01:11:12 | BES, zgbs NEW_RELEASES | 0 | NORMAL | 0 | 0.00 | 0 | 0.00 | 0s | 200 | 16640 | 31 16:00:00 | 31 16:00:00 | NES MOST_WISHED_FOR | 0 | NORMAL | 0 | 0.00 | 0 | 0.00 | 0s | 200 | 18020 | 31 16:00:00 | 31 16:00:00 | MO*R FCC.11 | 1000 | LOWER | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.12 | 2000 | LOWER2 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.13 | 3000 | LOWER3 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.14 | 4000 | LOWER4 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.15 | 5000 | LOWER5 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.16 | 214748364 | LOWEST | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 |

01:13:07.348 [r-worker-1] INFO a.p.s.crawl.AbstractRunnableCrawler - Ready for fetching ... 01:13:12.623 [r-worker-1] INFO a.p.p.crawl.impl.StreamingCrawler - 161. numRunning: 0, availableMemory: 91.15 MiB, memoryToReserve: 1.00 GiB, shortage: -978161664 B 01:13:28.733 [r-worker-1] INFO a.p.p.crawl.impl.StreamingCrawler - 181. numRunning: 0, availableMemory: 81.59 MiB, memoryToReserve: 1.00 GiB, shortage: -988184576 B 01:13:30.549 [em-0] INFO a.p.p.c.metrics.EnumCounterReporter - Counter reporter is started [ 2023-04-04T01:13:30.547572 ] [ UNNAMED JOB ] 01:13:30.554 [em-0] INFO a.p.p.c.metrics.EnumCounterReporter - All registered counters : 01:13:43.333 [r-worker-1] INFO a.p.p.crawl.impl.StreamingCrawler - 201. numRunning: 0, availableMemory: 86.98 MiB, memoryToReserve: 1.00 GiB, shortage: -982532096 B 01:13:44.247 [cheduler-1] DEBUG a.p.s.r.a.schedule.ScentRestMonitor - Try executing top N tasks ... 01:13:58.430 [r-worker-1] INFO a.p.p.crawl.impl.StreamingCrawler - 221. numRunning: 0, availableMemory: 84.61 MiB, memoryToReserve: 1.00 GiB, shortage: -985018368 B 01:14:13.322 [r-worker-1] INFO a.p.p.crawl.impl.StreamingCrawler - 241. numRunning: 0, availableMemory: 73.25 MiB, memoryToReserve: 1.00 GiB, shortage: -996937728 B 01:14:28.828 [r-worker-1] INFO a.p.p.crawl.impl.StreamingCrawler - 261. numRunning: 0, availableMemory: 59.61 MiB, memoryToReserve: 1.00 GiB, shortage: -1011232768 B 01:14:44.036 [r-worker-1] INFO a.p.p.crawl.impl.StreamingCrawler - 281. numRunning: 0, availableMemory: 83.97 MiB, memoryToReserve: 1.00 GiB, shortage: -985690112 B 01:14:46.717 [cheduler-1] INFO a.p.s.r.a.schedule.ScentRestMonitor - Total collected 1/0.00/1/0.00 in 4m34s, remaining 639/56649, collect time: 2023-04-04T08:11:12.003768Z -> 2023-04-04T08:11:12.027555Z 01:14:46.722 [cheduler-1] DEBUG a.p.s.r.a.schedule.ScentRestMonitor - Try executing top N tasks ... 01:14:59.097 [r-worker-1] INFO a.p.p.crawl.impl.StreamingCrawler - 301. numRunning: 0, availableMemory: 83.75 MiB, memoryToReserve: 1.00 GiB, shortage: -985919488 B 01:15:14.239 [r-worker-1] INFO a.p.p.crawl.impl.StreamingCrawler - 321. numRunning: 0, availableMemory: 93.46 MiB, memoryToReserve: 1.00 GiB, shortage: -975736832 B

platonai commented 1 year ago

It said very clear: no enough memory.

01:15:14.239 [r-worker-1] INFO a.p.p.crawl.impl.StreamingCrawler - 321. numRunning: 0, availableMemory: 93.46 MiB, memoryToReserve: 1.00 GiB, shortage: -975736832 B

swlcyx commented 1 year ago

how can i deal with it

swlcyx commented 1 year ago

where i could add the memory

platonai commented 1 year ago

Exotic-amazon is not a toy program but a real world solution to crawl one of the biggest website in the world, completely and accurately, so you'd better use a better computer to run the program.

If your computer has more than 4G memory, you can close all other programs in you computer and try the program again.

swlcyx commented 1 year ago

I use 32G now,it can run,but still has something error,could you explain the meaning of this output

swlcyx commented 1 year ago

02:39:02.910 [tartedMain] INFO a.p.p.crawl.impl.StreamingCrawlLoop - Crawl loop is created | @529190799 02:39:03.192 [tartedMain] INFO ai.platon.pulsar.crawl.CoreMetrics - maxUrlLength: 1024 timeoutUrls: 0 failedUrls: 0 deadUrls: 0 02:39:03.497 [tartedMain] INFO ai.platon.scent.ScentEnvironment - Registered features(69): top, left, width, height, char, txt_nd, img, a, sibling, child, dep, seq, txt_dns, pid, tag, nd_id, nd_cs, ft_sz, color, b_bolor, rtop, rleft, rrow, rcol, dist, simg, mimg, limg, aimg, saimg, maimg, laimg, char_max, char_ave, own_char, own_txt_nd, grant_child, descend, sep, rseq, txt_nd_c, vcc, vcv, avcc, avcv, hcc, hcv, ahcc, ahcv, txt_df, cap_df, tn_max_w, tn_ave_w, tn_max_h, tn_ave_h, a_max_w, a_ave_w, a_max_h, a_ave_h, img_max_w, img_ave_w, img_max_h, img_ave_h, tn_total_w, tn_total_h, a_total_w, a_total_h, img_total_w, img_total_h 02:39:03.498 [tartedMain] INFO ai.platon.scent.ScentEnvironment - Initializing feature calculator, use ChainedFeatureCalculator with Level2FeatureCalculator appended 02:39:03.498 [tartedMain] INFO ai.platon.scent.ScentEnvironment - Scent environment is initialized 02:39:03.504 [tartedMain] INFO a.p.s.q.h.c.s.AbstractScentSQLContext - SQLContext is created | 1/0 | ScentH2Context 02:39:03.509 [tartedMain] INFO a.p.pulsar.context.PulsarContexts - Active context | ai.platon.scent.ql.h2.context.support.ScentH2Context#1 02:39:03.898 [tartedMain] INFO a.p.p.c.c.GracefulScheduledExecutor - Scheduled monitor is started | EnumCounterReporter 02:39:04.982 [tartedMain] INFO a.p.p.c.c.GracefulScheduledExecutor - Scheduled monitor is started | WebDriverPoolMonitor 02:39:04.989 [tartedMain] INFO a.p.p.c.c.GracefulScheduledExecutor - Scheduled monitor is started | BrowserMonitor 02:39:04.999 [tartedMain] INFO a.p.p.c.c.GracefulScheduledExecutor - Scheduled monitor is started | BrowserPrivacyContextMonitor 02:39:05.638 [tartedMain] INFO a.p.p.crawl.impl.StreamingCrawlLoop - Registered 15 link collectors | loop#1 @1770617350 02:39:05.716 [r-worker-1] INFO a.p.p.crawl.impl.StreamingCrawler - Starting StreamingCrawler #1 ... 02:39:05.961 [r-worker-1] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:3, serverValue:14}] to localhost:27017 02:39:06.078 [tartedMain] INFO a.p.e.a.c.b.CrawlerConfiguration$logger$1 - profile: | [pulsar-default.xml, pulsar-site.xml, pulsar-task.xml] 02:39:06.283 [tartedMain] INFO a.p.e.a.c.b.CrawlerConfiguration$logger$1 - Configuration: core-default.xml, core-site.xml, jar:file:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-skeleton/1.10.11/pulsar-skeleton-1.10.11.jar!/config/legacy/pulsar-default.xml, jar:file:/home/coisin/.m2/repository/ai/platon/pulsar/pulsar-skeleton/1.10.11/pulsar-skeleton-1.10.11.jar!/config/legacy/pulsar-site.xml, jar:file:/home/coisin/.m2/repository/ai/platon/scent/scent-resources/1.10.4/scent-resources-1.10.4.jar!/config/legacy/default/pulsar-task.xml 02:39:06.289 [tartedMain] INFO a.p.e.a.c.c.h.p.WebDataExtractorInstaller - Initializing extractors, create extractors from config file | sites/amazon/crawl/parse/extract-config.json 02:39:06.711 [tartedMain] INFO a.p.e.a.c.c.h.p.WebDataExtractorInstaller - Installed SQL extractors: id: 1, pid: 0, pattern: .+/dp/.+, sql: x-asin.sql, sink: asin_sync, children: 7 id: 8, pid: 1, pattern: .+/dp/.+, sql: x-asin-sims-consolidated-1.sql, sink: asin_ad_sync, children: 0 id: 9, pid: 1, pattern: .+/dp/.+, sql: x-asin-sims-consolidated-2.sql, sink: asin_ad_sync, children: 0 id: 10, pid: 1, pattern: .+/dp/.+, sql: x-asin-sims-consolidated-3.sql, sink: asin_ad_sync, children: 0 id: 11, pid: 1, pattern: .+/dp/.+, sql: x-asin-sims-consider.sql, sink: asin_ad_sync, children: 0 id: 12, pid: 1, pattern: .+/dp/.+, sql: x-similar-items.sql, sink: asin_ad_sync, children: 0 id: 13, pid: 1, pattern: .+/dp/.+, sql: x-asin-top-reviews.sql, sink: asin_review_sync, children: 0 id: 14, pid: 1, pattern: .+/dp/.+, sql: x-asin-buy-choice.sql, sink: asin_buy_choice_sync, children: 0 id: 2, pid: 0, pattern: .+/seller/.+, sql: x-sellers.sql, sink: seller_sync, children: 0 id: 3, pid: 0, pattern: .+/product-reviews/.+, sql: x-asin-reviews.sql, sink: asin_review_sync, children: 0 id: 4, pid: 0, pattern: .+/zgbs/.+, sql: x-asin-best-sellers.sql, sink: asin_best_sellers_sync, children: 0 id: 5, pid: 0, pattern: .+/new-releases/.+, sql: x-asin-new-releases.sql, sink: asin_new_releases_sync, children: 0 id: 6, pid: 0, pattern: .+/movers-and-shakers/.+, sql: x-asin-movers-and-shakers.sql, sink: asin_movers_and_shakers_sync, children: 0 id: 7, pid: 0, pattern: .+/most-wished-for/.+, sql: x-asin-most-wished-for.sql, sink: asin_most_wished_for_sync, children: 0 id: 15, pid: 0, pattern: .+/dp/.+, sql: x-asin-customer-hui.sql, sink: asin_customer_hui_sync, children: 0 id: 16, pid: 0, pattern: .+.fr/.+/bestsellers/.+, sql: x-asin-best-sellers.sql, sink: asin_best_sellers_sync, children: 0 id: 17, pid: 0, pattern: .+.de/.+/bestsellers/.+, sql: x-asin-best-sellers.sql, sink: asin_best_sellers_sync, children: 0

02:39:06.799 [tartedMain] INFO a.p.e.a.starter.CrawlApplication - Submitted 25(25 & 1) bestseller urls at startup | sites/amazon/crawl/generate/periodical/p7d/com/best-sellers.txt, sites/amazon/crawl/generate/periodical/p7d/com/best-sellers.txt 02:39:06.892 [r-worker-1] INFO a.p.s.r.a.service.v1.ScrapeServiceV1 - Recovered 0 unfinished tasks in PT0.088915S 02:39:06.892 [r-worker-1] INFO a.p.s.crawl.AbstractRunnableCrawler - Injecting seeds ... 02:39:06.892 [r-worker-1] INFO a.p.s.crawl.AbstractRunnableCrawler - Generating fetch urls ... 02:39:06.904 [r-worker-1] INFO a.p.e.a.c.b.c.AmazonGenerator - Generating startup tasks ... 02:39:10.100 [r-worker-2] INFO a.p.p.crawl.protocol.ProtocolFactory - Supported protocols: crowd, browser 02:39:10.569 [r-worker-2] INFO a.p.p.c.parse.html.PrimerHtmlParser - className: PrimerHtmlParser defaultCharEncoding: utf-8 parseFilters: AmazonJdbcSinkSQLExtractor, AmazonJdbcSinkSQLExtractor, AmazonJdbcSinkSQLExtractor, AmazonJdbcSinkSQLExtractor, AmazonJdbcSinkSQLExtractor, AmazonJdbcSinkSQLExtractor, AmazonJdbcSinkSQLExtractor, AmazonJdbcSinkSQLExtractor, AmazonJdbcSinkSQLExtractor, AmazonJdbcSinkSQLExtractor 02:39:10.680 [r-worker-2] INFO a.p.p.c.parse.html.PrimerHtmlParser - className: PrimerHtmlParser defaultCharEncoding: utf-8 02:39:10.681 [r-worker-2] INFO a.p.pulsar.crawl.parse.ParserFactory - Active parsers: ----------Params Table----------
Name Value
text/html: ai.platon.pulsar.crawl.parse.html.PrimerHtmlParser application/x-javascript: text/xml: ai.platon.pulsar.parse.tika.TikaParser text/aspdotnet: ai.platon.pulsar.crawl.parse.html.PrimerHtmlParser application/rss+xml: ai.platon.pulsar.parse.tika.TikaParser *: ai.platon.pulsar.parse.tika.TikaParser application/xhtml+xml: ai.platon.pulsar.crawl.parse.html.PrimerHtmlParser

02:39:10.693 [r-worker-2] INFO a.p.pulsar.crawl.parse.PageParser - maxParseTime: PT1M maxParsedLinks: 200 groupMode: BY_HOST ignoreExternalLinks: false maxUrlLength: 1024 defaultAnchorLenMin: 2 defaultAnchorLenMax: 200 02:39:11.390 [tartedMain] INFO o.s.b.a.e.web.EndpointLinksResolver - Exposing 1 endpoint(s) beneath base path '/actuator' 02:39:11.656 [tartedMain] INFO o.a.coyote.http11.Http11NioProtocol - Starting ProtocolHandler ["http-nio-8182"] 02:39:11.772 [tartedMain] INFO o.s.b.w.e.tomcat.TomcatWebServer - Tomcat started on port(s): 8182 (http) with context path '/api' 02:39:11.829 [tartedMain] INFO a.p.e.amazon.starter.CrawlStarterKt - Started CrawlStarterKt in 23.65 seconds (JVM running for 26.667) 02:39:11.983 [r-worker-2] INFO a.p.s.org.mongodb.driver.cluster - Cluster created with settings {hosts=[127.0.0.1:27017], mode=MULTIPLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500} 02:39:11.988 [r-worker-2] INFO a.p.s.org.mongodb.driver.cluster - Adding discovered server 127.0.0.1:27017 to client view of cluster 02:39:12.218 [r-worker-2] INFO a.p.pulsar.persist.gora.GoraStorage - Backend data store: MongoStore realSchema: amazon_tmp_webpage110 02:39:12.218 [r-worker-2] INFO a.p.p.p.AutoDetectStorageProvider - Storage is created: class org.apache.gora.mongodb.store.MongoStore realSchema: amazon_tmp_webpage110 02:39:12.242 [.0.1:27017] INFO a.p.s.org.mongodb.driver.connection - Opened connection [connectionId{localValue:1, serverValue:15}] to 127.0.0.1:27017 02:39:12.255 [r-worker-2] INFO a.p.s.org.mongodb.driver.cluster - Cluster description not yet available. Waiting for 30000 ms before timing out 02:39:12.259 [.0.1:27017] INFO a.p.s.org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=127.0.0.1:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 6, 3]}, minWireVersion=0, maxWireVersion=6, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=14326030} 02:39:12.281 [.0.1:27017] INFO a.p.s.org.mongodb.driver.cluster - Discovered cluster type of STANDALONE 02:39:12.324 [r-worker-2] INFO a.p.s.org.mongodb.driver.connection - Opened connection [connectionId{localValue:2, serverValue:16}] to 127.0.0.1:27017 02:39:12.493 [r-worker-2] INFO a.p.p.c.f.p.PrivacyContextIdGeneratorFactory - Using id generator f.p.SequentialPrivacyContextIdGenerator 02:39:12.573 [r-worker-2] INFO a.p.p.p.b.e.c.MultiPrivacyContextManager - Privacy context is created #1D1eda2, active: 0, allowed: 2 02:39:12.622 [r-worker-2] INFO a.p.pulsar.common.proxy.ProxyLoader - Reload from file, last modified: 1970-01-01T00:00:00Z, elapsed: 466833h38m3s | /home/coisin/.pulsar/proxy/providers-enabled/proxy.providers.txt 02:39:12.794 [r-worker-2] INFO a.p.e.common.proxy.ProxyVendorLoader - Fetching proxies from provider | http://webapi.http.zhimacangku.com/getip?num=1&type=2&pro=&city=0&yys=0&port=11&pack=245503&ts=1&ys=1&cs=1&lb=1&sb=0&pb=4&mr=1&regions= 02:39:12.905 [r-worker-2] INFO a.p.e.common.proxy.ProxyVendorLoader - Testing proxies, vendor: zm, format: json | file:///home/coisin/.pulsar/proxy/proxies-archived/proxies.zhimacangku-com-022923478eac104f8fbdff5b63563d64.zm.json 02:39:13.080 [r-worker-2] INFO a.p.e.common.proxy.ProxyVendorLoader - Test passed: 121.234.164.120:4236 => 121.234.164.120[ready] 02:39:13.087 [r-worker-2] INFO a.p.e.common.proxy.ProxyVendorLoader - Loaded 1/1 proxies 02:39:20.739 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Loaded 100 links (dev mode) | /home/coisin/Desktop/exotic-amazon-main/exotic-amazon-main/target/classes/sites/amazon/crawl/generate/periodical/pt24h/best-sellers.txt 02:39:24.970 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Loaded 100 links (dev mode) | /home/coisin/Desktop/exotic-amazon-main/exotic-amazon-main/target/classes/sites/amazon/crawl/generate/periodical/pt24h/new-releases.txt 02:39:30.399 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Loaded 100 links (dev mode) | /home/coisin/Desktop/exotic-amazon-main/exotic-amazon-main/target/classes/sites/amazon/crawl/generate/periodical/pt24h/most-wished-for.txt 02:39:30.411 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Loaded 39 links (dev mode) | /home/coisin/Desktop/exotic-amazon-main/exotic-amazon-main/target/classes/sites/amazon/crawl/generate/periodical/pt1h/movers-and-shakers.txt 02:39:30.414 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Collected tasks: BEST_SELLERS, NEW_RELEASES, MOST_WISHED_FOR, MOVERS_AND_SHAKERS 02:39:30.416 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Creating collector for BEST_SELLERS 02:39:30.524 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Checking 100 links for task in database 02:39:30.525 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Checking 100 links for task in database 02:39:30.617 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Generated 100/0 BEST_SELLERS tasks with collector BEST_SELLERS in PT0S, with 0 ones removed(fetched) 02:39:34.984 [em-0] INFO a.p.p.p.b.d.WebDriverPoolManager - Maintaining service is started 02:39:45.621 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Creating collector for NEW_RELEASES 02:39:45.681 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Checking 100 links for task in database 02:39:45.682 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Checking 100 links for task in database 02:39:45.761 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Generated 100/0 NEW_RELEASES tasks with collector NEW_RELEASES in PT0S, with 0 ones removed(fetched) 02:40:00.762 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Creating collector for MOST_WISHED_FOR 02:40:00.792 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Checking 100 links for task in database 02:40:00.793 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Checking 100 links for task in database 02:40:00.847 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Generated 100/0 MOST_WISHED_FOR tasks with collector MOST_WISHED_FOR in PT0S, with 0 ones removed(fetched) 02:40:11.836 [cheduler-1] INFO a.p.s.r.a.s.s.TaskMultiHistoryService - Deleted 0 records before 2023-04-02T09:40:11.827732Z 02:40:11.842 [cheduler-1] INFO a.p.s.r.a.s.s.TaskMonthlyHistoryService - Deleted 0 records before 2023-03-04T09:40:11.837398Z from monthly history in PT0.005047S 02:40:11.843 [cheduler-1] DEBUG a.p.s.r.a.schedule.ScentRestMonitor - Try executing top N tasks ... 02:40:13.117 [r-worker-2] WARN a.p.p.p.b.d.LoadingWebDriverPool - Driver pool is exhausted | [Lack of resource] | active: 0, standby: 0, waiting: 0, working: 0, slots: 8, retired: 0, closed: 0 02:40:13.120 [r-worker-2] WARN a.p.p.p.b.e.context.WebDriverContext - 3. Retry task 1 in crawl scope | cause by: Driver pool is exhausted ([Lack of resource] | 0/0/0/0/8/0/0 (active/standby/waiting/working/slots/retired/closed)) 02:40:13.160 [3-thread-1] INFO a.p.p.p.b.e.c.MultiPrivacyContextManager - Maintaining service is started 02:40:13.334 [r-worker-2] INFO a.p.p.c.component.LoadComponent.Task - 3. πŸ’” πŸ”ƒ U for RT got 1601 0 <- 0 in 0s, last fetched 14m12s ago, fc:4/4 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 1D1eda2 | bestsellers | file:///tmp/ln/14e38572283c15ea22d4f2668ecba1a5.htm | https://www.amazon.com/Best-Sellers-Industrial-Scientific-Medical-Exam-Gloves/zgbs/industrial/8404645011/ref=zg_bs_nav_industrial_4_10871413011 -expires PT2400H -ignoreFailure -label bestsellers -parse -requireImages 50 -requireSize 300000 -scrollCount 15 -scrollInterval PT2S 02:40:13.335 [r-worker-2] INFO a.p.p.c.component.LoadComponent.Task - Log explanation: https://github.com/platonai/pulsarr/blob/master/docs/log-format.adoc 02:40:13.532 [r-worker-2] INFO a.p.p.c.impl.StreamingCrawler.Task - 3. Gone (unexpected) U for RT got 1601 0 <- 0 in 0s, last fetched 14m13s ago, fc:4/4 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 1D1eda2 | bestsellers | file:///tmp/ln/14e38572283c15ea22d4f2668ecba1a5.htm | https://www.amazon.com/Best-Sellers-Industrial-Scientific-Medical-Exam-Gloves/zgbs/industrial/8404645011/ref=zg_bs_nav_industrial_4_10871413011 02:40:13.693 [r-worker-2] INFO a.p.p.c.component.LoadComponent.Task - 5. πŸ’― πŸ’Ώ I got 200 430.40 KiB [πŸ’Ώ499.96 KiB] in 4m5.078s, last fetched 26h55m53s ago, fc:2 | 59/319/140/204/7858 | nf:14/14/14 | bestsellers | https://www.amazon.com/Best-Sellers-Pet-Supplies-Dog-Bed-Blankets/zgbs/pet-supplies/3024177011/ref=zg_bs_nav_pet-supplies_3_2975326011 -expires PT2400H -ignoreFailure -label bestsellers -parse -requireImages 50 -requireSize 300000 -scrollCount 15 -scrollInterval PT2S H2 Engine implementation is: class ai.platon.scent.ql.h2.H2SessionFactory 02:40:15.849 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Creating collector for MOVERS_AND_SHAKERS 02:40:15.905 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Checking 39 links for task in database 02:40:15.906 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Checking 39 links for task in database 02:40:15.946 [r-worker-1] INFO a.p.e.a.c.g.PeriodicalSeedsGenerator - Generated 39/0 MOVERS_AND_SHAKERS tasks with collector MOVERS_AND_SHAKERS in PT0S, with 0 ones removed(fetched) 02:40:23.334 [r-worker-2] INFO a.p.scent.ql.h2.H2SessionFactory - ScentSQLSession #2 is created for h2session <#2 (user: SA)>, connection: 02:40:23.890 [r-worker-2] INFO a.p.s.p.h.A.Task - 5. Parsed in 545ms/0.55s 545.00ms/p | 699/700/700 fields in 50/50 records | pending 0/1 results | bestsellers | best-sellers -> asin_best_sellers_sync 02:40:23.976 [r-worker-2] INFO a.p.e.a.c.b.c.AmazonLinkCollector - Collected 50 asin links from bestseller | https://www.amazon.com/Best-Sellers-Pet-Supplies-Dog-Bed-Blankets/zgbs/pet-supplies/3024177011/ref=zg_bs_nav_pet-supplies_3_2975326011 02:40:24.160 [r-worker-2] INFO a.p.p.p.b.e.c.MultiPrivacyContextManager - Privacy context is created #2icQMk2, active: 1, allowed: 2 02:40:24.175 [r-worker-2] INFO a.p.e.common.proxy.ProxyVendorLoader - Fetching proxies from provider | http://webapi.http.zhimacangku.com/getip?num=1&type=2&pro=&city=0&yys=0&port=11&pack=245503&ts=1&ys=1&cs=1&lb=1&sb=0&pb=4&mr=1&regions= 02:40:24.251 [r-worker-2] INFO a.p.e.common.proxy.ProxyVendorLoader - Testing proxies, vendor: zm, format: json | file:///home/coisin/.pulsar/proxy/proxies-archived/proxies.zhimacangku-com-022923478eac104f8fbdff5b63563d64.zm.json 02:40:24.254 [r-worker-2] INFO a.p.e.common.proxy.ProxyVendorLoader - Test passed: 114.102.181.167:4213 => 114.102.181.167[ready] 02:40:24.255 [r-worker-2] INFO a.p.e.common.proxy.ProxyVendorLoader - Loaded 1/2 proxies 02:40:24.258 [r-worker-2] INFO a.p.p.p.b.d.LoadingWebDriverPool - Critical memory: 117.04 MiB, 8/0/0 (slots/activeP/activeB), will not create new driver 02:40:31.204 [r-worker-1] INFO a.p.e.a.c.b.component.AmazonCrawler - Registered collectors: name | priority | pName | collected | cd/s | collect | c/s | time | size | estSize | firstCollect | lastCollect | labels FCC#RealTime | -214748364 | HIGHEST | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.4 | -214748364 | HIGHEST | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | DelayCC#Delay | -5000 | HIGHER5 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.5 | -5000 | HIGHER5 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.6 | -4000 | HIGHER4 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.7 | -3000 | HIGHER3 | 0 | 0.00 | 0 | 0.00 | 0s | 1 | 1 | 31 16:00:00 | 31 16:00:00 | MOVERS_AND_SHAKERS | -3000 | HIGHER3 | 0 | 0.00 | 0 | 0.00 | 0s | 39 | 39 | 31 16:00:00 | 31 16:00:00 | MOS FCC.8 | -2000 | HIGHER2 | 0 | 0.00 | 0 | 0.00 | 0s | 50 | 50 | 31 16:00:00 | 31 16:00:00 | SCRAPE | -2000 | HIGHER2 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | SCE FCC.9 | -1000 | HIGHER | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.10 | 0 | NORMAL | 17 | 17.00 | 17 | 17.00 | 0s | 8 | 8 | 04 02:39:07 | 04 02:39:07 | BEST_SELLERS | 0 | NORMAL | 0 | 0.00 | 0 | 0.00 | 0s | 100 | 100 | 31 16:00:00 | 31 16:00:00 | BES NEW_RELEASES | 0 | NORMAL | 0 | 0.00 | 0 | 0.00 | 0s | 100 | 100 | 31 16:00:00 | 31 16:00:00 | NES MOST_WISHED_FOR | 0 | NORMAL | 0 | 0.00 | 0 | 0.00 | 0s | 100 | 100 | 31 16:00:00 | 31 16:00:00 | MO*R FCC.11 | 1000 | LOWER | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.12 | 2000 | LOWER2 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.13 | 3000 | LOWER3 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.14 | 4000 | LOWER4 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.15 | 5000 | LOWER5 | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 | FCC.16 | 214748364 | LOWEST | 0 | 0.00 | 0 | 0.00 | 0s | 0 | 0 | 31 16:00:00 | 31 16:00:00 |

02:40:31.205 [r-worker-1] INFO a.p.s.crawl.AbstractRunnableCrawler - Ready for fetching ... 02:40:31.221 [r-worker-1] INFO a.p.p.crawl.impl.StreamingCrawler - 1. numRunning: 14, availableMemory: 114.46 MiB, memoryToReserve: 1.00 GiB, shortage: -953720832 B 02:40:31.539 [r-worker-1] INFO a.p.p.p.b.d.LoadingWebDriverPool - Critical memory: 105.28 MiB, 8/0/0 (slots/activeP/activeB), will not create new driver 02:41:12.131 [cheduler-1] INFO a.p.s.r.a.schedule.ScentRestMonitor - Total collected 17/0.12/17/0.12 in 2m24s, remaining 398/398, collect time: 2023-04-04T09:39:07.287918Z -> 2023-04-04T09:39:07.672023Z 02:41:12.144 [cheduler-1] DEBUG a.p.s.r.a.schedule.ScentRestMonitor - Try executing top N tasks ... 02:41:24.261 [r-worker-2] WARN a.p.p.p.b.d.LoadingWebDriverPool - Driver pool is exhausted | [Lack of resource] | active: 0, standby: 0, waiting: 0, working: 0, slots: 8, retired: 0, closed: 0 02:41:24.266 [r-worker-2] WARN a.p.p.p.b.e.context.WebDriverContext - 8. Retry task 2 in crawl scope | cause by: Driver pool is exhausted ([Lack of resource] | 0/0/0/0/8/0/0 (active/standby/waiting/working/slots/retired/closed)) 02:41:24.288 [r-worker-2] INFO a.p.p.c.component.LoadComponent.Task - 8. πŸ’” πŸ”ƒ U for RT got 1601 0 <- 0 in 0s, last fetched 13m58s ago, fc:5/5 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 2icQMk2 | bestsellers | file:///tmp/ln/5c74ac7ee28e704afbd147c4a6f9e588.htm | https://www.amazon.com/Best-Sellers-Pet-Supplies-Dog-Carrier-Backpacks/zgbs/pet-supplies/2975334011/ref=zg_bs_nav_pet-supplies_3_2975333011 -expires PT2400H -ignoreFailure -label bestsellers -parse -requireImages 50 -requireSize 300000 -scrollCount 15 -scrollInterval PT2S 02:41:24.671 [r-worker-2] INFO a.p.p.c.impl.StreamingCrawler.Task - 8. Gone (unexpected) U for RT got 1601 0 <- 0 in 0s, last fetched 13m58s ago, fc:5/5 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 2icQMk2 | bestsellers | file:///tmp/ln/5c74ac7ee28e704afbd147c4a6f9e588.htm | https://www.amazon.com/Best-Sellers-Pet-Supplies-Dog-Carrier-Backpacks/zgbs/pet-supplies/2975334011/ref=zg_bs_nav_pet-supplies_3_2975333011 02:41:24.732 [r-worker-2] INFO a.p.p.p.b.d.LoadingWebDriverPool - Critical memory: 95.45 MiB, 8/0/0 (slots/activeP/activeB), will not create new driver 02:41:31.540 [r-worker-1] WARN a.p.p.p.b.d.LoadingWebDriverPool - Driver pool is exhausted | [Lack of resource] | active: 0, standby: 0, waiting: 0, working: 0, slots: 8, retired: 0, closed: 0 02:41:31.540 [r-worker-1] WARN a.p.p.p.b.e.context.WebDriverContext - 10. Retry task 3 in crawl scope | cause by: Driver pool is exhausted ([Lack of resource] | 0/0/0/0/8/0/0 (active/standby/waiting/working/slots/retired/closed)) 02:41:31.545 [r-worker-1] INFO a.p.p.c.component.LoadComponent.Task - 10. πŸ’” Updated πŸ”ƒ U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m1s ago, fc:5/5 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 1D1eda2 | bestsellers | file:///tmp/ln/8d7cda9e20623362dd8d1e590ae30dc0.htm | https://www.amazon.com/Best-Sellers-Pet-Supplies-Cat-Toy-Balls/zgbs/pet-supplies/2975304011/ref=zg_bs_nav_pet-supplies_3_2975303011 -expires PT2400H -ignoreFailure -label bestsellers -parse -requireImages 50 -requireSize 300000 -scrollCount 15 -scrollInterval PT2S 02:41:31.559 [r-worker-1] INFO a.p.p.c.impl.StreamingCrawler.Task - 10. Gone (unexpected) U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m1s ago, fc:5/5 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 1D1eda2 | bestsellers | file:///tmp/ln/8d7cda9e20623362dd8d1e590ae30dc0.htm | https://www.amazon.com/Best-Sellers-Pet-Supplies-Cat-Toy-Balls/zgbs/pet-supplies/2975304011/ref=zg_bs_nav_pet-supplies_3_2975303011 02:41:31.586 [r-worker-1] INFO a.p.p.p.b.d.LoadingWebDriverPool - Critical memory: 94.86 MiB, 8/0/0 (slots/activeP/activeB), will not create new driver 02:42:03.896 [em-0] INFO a.p.p.c.metrics.EnumCounterReporter - Counter reporter is started [ 2023-04-04T02:42:03.895277 ] [ UNNAMED JOB ] 02:42:03.904 [em-0] INFO a.p.p.c.metrics.EnumCounterReporter - All registered counters : c.c.UpdateComponent_C_Counter, c.p.PageParser_Counter, p.h.ExtractCounter 02:42:03.936 [em-0] INFO a.p.p.c.metrics.EnumCounterReporter - 3'xCheck:41, 3'xFitPages:1 02:42:12.163 [cheduler-1] DEBUG a.p.s.r.a.schedule.ScentRestMonitor - Try executing top N tasks ... 02:42:24.735 [r-worker-2] WARN a.p.p.p.b.d.LoadingWebDriverPool - Driver pool is exhausted | [Lack of resource] | active: 0, standby: 0, waiting: 0, working: 0, slots: 8, retired: 0, closed: 0 02:42:24.736 [r-worker-2] WARN a.p.p.p.b.e.context.WebDriverContext - 12. Retry task 4 in crawl scope | cause by: Driver pool is exhausted ([Lack of resource] | 0/0/0/0/8/0/0 (active/standby/waiting/working/slots/retired/closed)) 02:42:24.752 [r-worker-2] INFO a.p.p.c.component.LoadComponent.Task - 12. πŸ’” πŸ”ƒ U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m54s ago, fc:5/5 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 2icQMk2 | bestsellers | file:///tmp/ln/3581da599e95b57aec6735db1f41343d.htm | https://www.amazon.com/Best-Sellers-Pet-Supplies-Dry-Cat-Food/zgbs/pet-supplies/2975266011/ref=zg_bs_nav_pet-supplies_3_2975265011 -expires PT2400H -ignoreFailure -label bestsellers -parse -requireImages 50 -requireSize 300000 -scrollCount 15 -scrollInterval PT2S 02:42:24.782 [r-worker-2] INFO a.p.p.c.impl.StreamingCrawler.Task - 12. Gone (unexpected) U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m54s ago, fc:5/5 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 2icQMk2 | bestsellers | file:///tmp/ln/3581da599e95b57aec6735db1f41343d.htm | https://www.amazon.com/Best-Sellers-Pet-Supplies-Dry-Cat-Food/zgbs/pet-supplies/2975266011/ref=zg_bs_nav_pet-supplies_3_2975265011 02:42:24.818 [r-worker-2] INFO a.p.p.p.b.d.LoadingWebDriverPool - Critical memory: 94.04 MiB, 8/0/0 (slots/activeP/activeB), will not create new driver 02:42:31.589 [r-worker-1] WARN a.p.p.p.b.d.LoadingWebDriverPool - Driver pool is exhausted | [Lack of resource] | active: 0, standby: 0, waiting: 0, working: 0, slots: 8, retired: 0, closed: 0 02:42:31.589 [r-worker-1] WARN a.p.p.p.b.e.context.WebDriverContext - 14. Retry task 5 in crawl scope | cause by: Driver pool is exhausted ([Lack of resource] | 0/0/0/0/8/0/0 (active/standby/waiting/working/slots/retired/closed)) 02:42:31.592 [r-worker-1] INFO a.p.p.c.component.LoadComponent.Task - 14. πŸ’” πŸ”ƒ U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m1s ago, fc:5/5 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 1D1eda2 | bestsellers | file:///tmp/ln/603831facc27efa63186275bb81e2cac.htm | https://www.amazon.com/Best-Sellers-Pet-Supplies-Horse-Care-Blankets/zgbs/pet-supplies/3085138011/ref=zg_bs_nav_pet-supplies_3_3407521 -expires PT2400H -ignoreFailure -label bestsellers -parse -requireImages 50 -requireSize 300000 -scrollCount 15 -scrollInterval PT2S 02:42:31.610 [r-worker-1] INFO a.p.p.c.impl.StreamingCrawler.Task - 14. Gone (unexpected) U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m1s ago, fc:5/5 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 1D1eda2 | bestsellers | file:///tmp/ln/603831facc27efa63186275bb81e2cac.htm | https://www.amazon.com/Best-Sellers-Pet-Supplies-Horse-Care-Blankets/zgbs/pet-supplies/3085138011/ref=zg_bs_nav_pet-supplies_3_3407521 02:42:31.649 [r-worker-1] INFO a.p.p.p.b.d.LoadingWebDriverPool - Critical memory: 93.78 MiB, 8/0/0 (slots/activeP/activeB), will not create new driver 02:42:33.900 [em-0] INFO a.p.p.c.metrics.EnumCounterReporter - 3'xCheck:61, 3'xFitPages:1 02:43:12.299 [cheduler-1] INFO a.p.s.r.a.schedule.ScentRestMonitor - Total collected 17/0.06/17/0.06 in 4m24s, remaining 398/398, collect time: 2023-04-04T09:39:07.287918Z -> 2023-04-04T09:39:07.672023Z 02:43:12.309 [cheduler-1] DEBUG a.p.s.r.a.schedule.ScentRestMonitor - Try executing top N tasks ... 02:43:24.820 [r-worker-2] WARN a.p.p.p.b.d.LoadingWebDriverPool - Driver pool is exhausted | [Lack of resource] | active: 0, standby: 0, waiting: 0, working: 0, slots: 8, retired: 0, closed: 0 02:43:24.820 [r-worker-2] WARN a.p.p.p.b.e.context.WebDriverContext - 16. Retry task 6 in crawl scope | cause by: Driver pool is exhausted ([Lack of resource] | 0/0/0/0/8/0/0 (active/standby/waiting/working/slots/retired/closed)) 02:43:24.826 [r-worker-2] INFO a.p.p.c.component.LoadComponent.Task - 16. πŸ’” πŸ”ƒ U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m54s ago, fc:4/4 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 2icQMk2 | bestsellers | file:///tmp/ln/a7a939bf529600f09d5e66c559c32505.htm | https://www.amazon.com/Best-Sellers-Patio-Lawn-Garden-Outdoor-Generators/zgbs/lawn-garden/348967011/ref=zg_bs_nav_lawn-garden_2_552808 -expires PT2400H -ignoreFailure -label bestsellers -parse -requireImages 50 -requireSize 300000 -scrollCount 15 -scrollInterval PT2S 02:43:24.851 [r-worker-2] INFO a.p.p.c.impl.StreamingCrawler.Task - 16. Gone (unexpected) U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m54s ago, fc:4/4 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 2icQMk2 | bestsellers | file:///tmp/ln/a7a939bf529600f09d5e66c559c32505.htm | https://www.amazon.com/Best-Sellers-Patio-Lawn-Garden-Outdoor-Generators/zgbs/lawn-garden/348967011/ref=zg_bs_nav_lawn-garden_2_552808 02:43:24.882 [r-worker-2] INFO a.p.p.p.b.d.LoadingWebDriverPool - Critical memory: 95.26 MiB, 8/0/0 (slots/activeP/activeB), will not create new driver 02:43:31.652 [r-worker-1] WARN a.p.p.p.b.d.LoadingWebDriverPool - Driver pool is exhausted | [Lack of resource] | active: 0, standby: 0, waiting: 0, working: 0, slots: 8, retired: 0, closed: 0 02:43:31.653 [r-worker-1] WARN a.p.p.p.b.e.context.WebDriverContext - 18. Retry task 7 in crawl scope | cause by: Driver pool is exhausted ([Lack of resource] | 0/0/0/0/8/0/0 (active/standby/waiting/working/slots/retired/closed)) 02:43:31.655 [r-worker-1] INFO a.p.p.c.component.LoadComponent.Task - 18. πŸ’” πŸ”ƒ U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m1s ago, fc:4/4 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 1D1eda2 | bestsellers | file:///tmp/ln/3b3b20eed3ea368a5a3e836b0f025d97.htm | https://www.amazon.com/Best-Sellers-Patio-Lawn-Garden-Solar-Panels/zgbs/lawn-garden/2236628011/ref=zg_bs_nav_lawn-garden_3_3236381 -expires PT2400H -ignoreFailure -label bestsellers -parse -requireImages 50 -requireSize 300000 -scrollCount 15 -scrollInterval PT2S 02:43:31.670 [r-worker-1] INFO a.p.p.c.impl.StreamingCrawler.Task - 18. Gone (unexpected) U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m1s ago, fc:4/4 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 1D1eda2 | bestsellers | file:///tmp/ln/3b3b20eed3ea368a5a3e836b0f025d97.htm | https://www.amazon.com/Best-Sellers-Patio-Lawn-Garden-Solar-Panels/zgbs/lawn-garden/2236628011/ref=zg_bs_nav_lawn-garden_3_3236381 02:43:31.687 [r-worker-1] INFO a.p.p.p.b.d.LoadingWebDriverPool - Critical memory: 94.63 MiB, 8/0/0 (slots/activeP/activeB), will not create new driver 02:43:33.896 [em-0] INFO a.p.p.c.metrics.EnumCounterReporter - 3'xCheck:81, 3'xFitPages:1

swlcyx commented 1 year ago

i use zm proxie and tests passed but craw states are not success

swlcyx commented 1 year ago

is this cause itοΌ›10. Retry task 3 in crawl scope | cause by: Driver pool is exhausted ([Lack of resource] | 0/0/0/0/8/0/0 (active/standby/waiting/working/slots/retired/closed))

swlcyx commented 1 year ago

02:45:31.751 [r-worker-1] WARN a.p.p.p.b.d.LoadingWebDriverPool - Driver pool is exhausted | [Lack of resource] | active: 0, standby: 0, waiting: 0, working: 0, slots: 8, retired: 0, closed: 0 02:45:31.752 [r-worker-1] WARN a.p.p.p.b.e.context.WebDriverContext - 26. Retry task 11 in crawl scope | cause by: Driver pool is exhausted ([Lack of resource] | 0/0/0/0/8/0/0 (active/standby/waiting/working/slots/retired/closed)) 02:45:31.753 [r-worker-1] INFO a.p.p.p.b.e.c.MultiPrivacyContextManager - ⚠ Privacy leak warning 1/8 | 1#1D1eda2 | 26. Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL 02:45:31.755 [r-worker-1] INFO a.p.p.c.component.LoadComponent.Task - 26. πŸ’” πŸ”ƒ U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m1s ago, fc:3/3 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 1D1eda2 | bestsellers | file:///tmp/ln/1dece74ce0a0606fcaf3488bd812be10.htm | https://www.amazon.com/Best-Sellers-Industrial-Scientific-Clinical-Diagnostic-Test-Kits/zgbs/industrial/393454011/ref=zg_bs_nav_industrial_4_11312340011 -expires PT2400H -ignoreFailure -label bestsellers -parse -requireImages 50 -requireSize 300000 -scrollCount 15 -scrollInterval PT2S 02:45:31.769 [r-worker-1] INFO a.p.p.c.impl.StreamingCrawler.Task - 26. Gone (unexpected) U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m1s ago, fc:3/3 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 1D1eda2 | bestsellers | file:///tmp/ln/1dece74ce0a0606fcaf3488bd812be10.htm | https://www.amazon.com/Best-Sellers-Industrial-Scientific-Clinical-Diagnostic-Test-Kits/zgbs/industrial/393454011/ref=zg_bs_nav_industrial_4_11312340011 02:45:31.790 [r-worker-1] INFO a.p.p.p.b.d.LoadingWebDriverPool - Critical memory: 102.85 MiB, 8/0/0 (slots/activeP/activeB), will not create new driver 02:45:33.895 [em-0] INFO a.p.p.c.metrics.EnumCounterReporter - 3'xCheck:121, 3'xFitPages:1 02:46:12.461 [cheduler-1] DEBUG a.p.s.r.a.schedule.ScentRestMonitor - Try executing top N tasks ... 02:46:24.996 [r-worker-2] WARN a.p.p.p.b.d.LoadingWebDriverPool - Driver pool is exhausted | [Lack of resource] | active: 0, standby: 0, waiting: 0, working: 0, slots: 8, retired: 0, closed: 0 02:46:24.996 [r-worker-2] WARN a.p.p.p.b.e.context.WebDriverContext - 28. Retry task 12 in crawl scope | cause by: Driver pool is exhausted ([Lack of resource] | 0/0/0/0/8/0/0 (active/standby/waiting/working/slots/retired/closed)) 02:46:24.997 [r-worker-2] INFO a.p.p.p.b.e.c.MultiPrivacyContextManager - ⚠ Privacy leak warning 1/8 | 2#2icQMk2 | 28. Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL 02:46:25.001 [r-worker-2] INFO a.p.p.c.component.LoadComponent.Task - 28. πŸ’” πŸ”ƒ U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m54s ago, fc:3/3 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 2icQMk2 | bestsellers | file:///tmp/ln/950518ba0ca5311e78cc208b8633560e.htm | https://www.amazon.com/Best-Sellers-Industrial-Scientific-Medical-Exam-Table-Paper/zgbs/industrial/8297420011/ref=zg_bs_nav_industrial_3_8297374011 -expires PT2400H -ignoreFailure -label bestsellers -parse -requireImages 50 -requireSize 300000 -scrollCount 15 -scrollInterval PT2S 02:46:25.029 [r-worker-2] INFO a.p.p.c.impl.StreamingCrawler.Task - 28. Gone (unexpected) U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m54s ago, fc:3/3 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 2icQMk2 | bestsellers | file:///tmp/ln/950518ba0ca5311e78cc208b8633560e.htm | https://www.amazon.com/Best-Sellers-Industrial-Scientific-Medical-Exam-Table-Paper/zgbs/industrial/8297420011/ref=zg_bs_nav_industrial_3_8297374011 02:46:25.048 [r-worker-2] INFO a.p.p.p.b.d.LoadingWebDriverPool - Critical memory: 100.35 MiB, 8/0/0 (slots/activeP/activeB), will not create new driver 02:46:31.792 [r-worker-1] WARN a.p.p.p.b.d.LoadingWebDriverPool - Driver pool is exhausted | [Lack of resource] | active: 0, standby: 0, waiting: 0, working: 0, slots: 8, retired: 0, closed: 0 02:46:31.792 [r-worker-1] WARN a.p.p.p.b.e.context.WebDriverContext - 30. Retry task 13 in crawl scope | cause by: Driver pool is exhausted ([Lack of resource] | 0/0/0/0/8/0/0 (active/standby/waiting/working/slots/retired/closed)) 02:46:31.794 [r-worker-1] INFO a.p.p.p.b.e.c.MultiPrivacyContextManager - ⚠ Privacy leak warning 1/8 | 1#1D1eda2 | 30. Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL 02:46:31.798 [r-worker-1] INFO a.p.p.c.component.LoadComponent.Task - 30. πŸ’” Updated πŸ”ƒ U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m1s ago, fc:3/3 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 1D1eda2 | bestsellers | file:///tmp/ln/098318ac07c9bea5107d76fcf354b98a.htm | https://www.amazon.com/Best-Sellers-Industrial-Scientific-Medical-Procedure-Masks/zgbs/industrial/16035056011/ref=zg_bs_nav_industrial_5_21265059011 -expires PT2400H -ignoreFailure -label bestsellers -parse -requireImages 50 -requireSize 300000 -scrollCount 15 -scrollInterval PT2S 02:46:31.812 [r-worker-1] INFO a.p.p.c.impl.StreamingCrawler.Task - 30. Gone (unexpected) U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m1s ago, fc:3/3 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 1D1eda2 | bestsellers | file:///tmp/ln/098318ac07c9bea5107d76fcf354b98a.htm | https://www.amazon.com/Best-Sellers-Industrial-Scientific-Medical-Procedure-Masks/zgbs/industrial/16035056011/ref=zg_bs_nav_industrial_5_21265059011 02:46:31.847 [r-worker-1] INFO a.p.p.p.b.d.LoadingWebDriverPool - Critical memory: 100.42 MiB, 8/0/0 (slots/activeP/activeB), will not create new driver 02:46:33.895 [em-0] INFO a.p.p.c.metrics.EnumCounterReporter - 3'xCheck:141, 3'xFitPages:1 02:47:12.622 [cheduler-1] INFO a.p.s.r.a.schedule.ScentRestMonitor - Total collected 17/0.03/17/0.03 in 8m24s, remaining 398/398, collect time: 2023-04-04T09:39:07.287918Z -> 2023-04-04T09:39:07.672023Z 02:47:12.627 [cheduler-1] DEBUG a.p.s.r.a.schedule.ScentRestMonitor - Try executing top N tasks ... 02:47:25.050 [r-worker-2] WARN a.p.p.p.b.d.LoadingWebDriverPool - Driver pool is exhausted | [Lack of resource] | active: 0, standby: 0, waiting: 0, working: 0, slots: 8, retired: 0, closed: 0 02:47:25.051 [r-worker-2] WARN a.p.p.p.b.e.context.WebDriverContext - 32. Retry task 14 in crawl scope | cause by: Driver pool is exhausted ([Lack of resource] | 0/0/0/0/8/0/0 (active/standby/waiting/working/slots/retired/closed)) 02:47:25.052 [r-worker-2] INFO a.p.p.p.b.e.c.MultiPrivacyContextManager - ⚠ Privacy leak warning 1/8 | 2#2icQMk2 | 32. Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL 02:47:25.059 [r-worker-2] INFO a.p.p.c.component.LoadComponent.Task - 32. πŸ’” πŸ”ƒ U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m54s ago, fc:3/3 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 2icQMk2 | bestsellers | file:///tmp/ln/8faaf459c154a1783ea07d28dc3395b9.htm | https://www.amazon.com/Best-Sellers-Industrial-Scientific-Medical-Respirator-Masks/zgbs/industrial/21265060011/ref=zg_bs_nav_industrial_5_16035056011 -expires PT2400H -ignoreFailure -label bestsellers -parse -requireImages 50 -requireSize 300000 -scrollCount 15 -scrollInterval PT2S 02:47:25.074 [r-worker-2] INFO a.p.p.c.impl.StreamingCrawler.Task - 32. Gone (unexpected) U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m54s ago, fc:3/3 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 2icQMk2 | bestsellers | file:///tmp/ln/8faaf459c154a1783ea07d28dc3395b9.htm | https://www.amazon.com/Best-Sellers-Industrial-Scientific-Medical-Respirator-Masks/zgbs/industrial/21265060011/ref=zg_bs_nav_industrial_5_16035056011 02:47:31.849 [r-worker-1] WARN a.p.p.p.b.d.LoadingWebDriverPool - Driver pool is exhausted | [Lack of resource] | active: 0, standby: 0, waiting: 0, working: 0, slots: 8, retired: 0, closed: 0 02:47:31.850 [r-worker-1] WARN a.p.p.p.b.e.context.WebDriverContext - 34. Retry task 15 in crawl scope | cause by: Driver pool is exhausted ([Lack of resource] | 0/0/0/0/8/0/0 (active/standby/waiting/working/slots/retired/closed)) 02:47:31.851 [r-worker-1] INFO a.p.p.p.b.e.c.MultiPrivacyContextManager - ⚠ Privacy leak warning 1/8 | 1#1D1eda2 | 34. Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL 02:47:31.854 [r-worker-1] INFO a.p.p.c.component.LoadComponent.Task - 34. πŸ’” πŸ”ƒ U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m1s ago, fc:4/4 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 1D1eda2 | bestsellers | file:///tmp/ln/65e85d285c4bcdeb2e1886d7b7143367.htm | https://www.amazon.com/Best-Sellers-Industrial-Scientific-Surgical-Masks/zgbs/industrial/11312315011/ref=zg_bs_nav_industrial_5_21265060011 -expires PT2400H -ignoreFailure -label bestsellers -parse -requireImages 50 -requireSize 300000 -scrollCount 15 -scrollInterval PT2S 02:47:31.872 [r-worker-1] INFO a.p.p.c.impl.StreamingCrawler.Task - 34. Gone (unexpected) U for RT got 1601 0 <- 0 in 0s, last fetched 26h45m1s ago, fc:4/4 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 1D1eda2 | bestsellers | file:///tmp/ln/65e85d285c4bcdeb2e1886d7b7143367.htm | https://www.amazon.com/Best-Sellers-Industrial-Scientific-Surgical-Masks/zgbs/industrial/11312315011/ref=zg_bs_nav_industrial_5_21265060011 02:47:33.895 [em-0] INFO a.p.p.c.metrics.EnumCounterReporter - 3'xCheck:161, 3'xFitPages:1

platonai commented 1 year ago

It is very strange to see the following log if your computer really has 32G memory and not used by other programs.

02:40:31.221 [r-worker-1] INFO a.p.p.crawl.impl.StreamingCrawler - 1. numRunning: 14, availableMemory: 114.46 MiB, memoryToReserve: 1.00 GiB, shortage: -953720832 B

platonai commented 1 year ago

What about clean the project and build it again?

swlcyx commented 1 year ago

i will try it without any program today,thinks for your explaining

swlcyx commented 1 year ago

i try again.but has this error:INFO a.p.p.c.component.LoadComponent.Task - 8. πŸ’” πŸ”ƒ U for RT got 1601 0 <- 0 in 0s, last fetched 13m58s ago, fc:5/5 Retry(1601) rs: WebDriverUnavailableException, rsp: CRAWL | 2icQMk2 | bestsellers |

platonai commented 1 year ago

It's OK to have a certain rate of fetch failure. Only when all fetch tasks fail, it indicates program bugs.