medialab / hyphe

Websites crawler with built-in exploration and control web interface
http://hyphe.medialab.sciences-po.fr/demo/
GNU Affero General Public License v3.0
328 stars 59 forks source link

The database emptied by itself #411

Closed edouardschuppert closed 3 years ago

edouardschuppert commented 3 years ago

Hello,

On my Ubuntu 18.04 server, I installed Hyphe via the Docker you specify in the documentation. I have configured the data backup in an external folder. It ran about 12 hours on the 70 domains I specified. After that, I went back to the application, and it's like I emptied the database. Would you know where this could be coming from? Is there anything special that needs to be configured, especially in the resource allocation?

The server has 32GB of Ram and 4 cores at 3.5GHz.

boogheta commented 3 years ago

Hello, This is quite surprising, I haven't heard of cases with such problems so far and it's quite hard to help with this much information. Can you copy paste your config files, the command you ran, etc?

edouardschuppert commented 3 years ago

I cloned the repo in my home (/home/edouardschuppert/), and configured the .env as follows:

TAG=prod PUBLIC_PORT=5905 DATA_PATH=/home/edouardschuppert/hyphe_bdd/ RESTART_POLICY=unless-stopped

I modified the end of the docker-compose to expose the mongodb port:

ports: - "27017:27017"

And I started with:

docker-compose pull docker-compose build docker-compose up -d

boogheta commented 3 years ago

Apart from the fact you did both pull and build which are redundant (you only need to build if you edit the code), everything looks normal to me and I don't see why data would have disappear. Can you describe what you mean when you say "it's like I emptied the database" ? What did you do and see that let you conclude this?

edouardschuppert commented 3 years ago

I started the crawl, everything was working fine. I went back to Hyphe a few hours later, and obviously my corpus had disappeared. I could only add a new one. I connected to MongoDB, and apparently the data was gone.

boogheta commented 3 years ago

So by "obviously my corpus had disappeared" do you mean the web interface was not listing any corpus? Can you share the logs from the docker hyphe_backend container ? Also how did you connect to Mongo and check the data was gone?

edouardschuppert commented 3 years ago

Indeed, the web interface is not listing any corpus. As for the database, I looked a little closer, and the data are in fact still there, especially for the "pages" collection of the database. Here is a copy of the end of the hyphe_backend logs:

2021-06-03 02:12:38+0000 [INFO - mee] Indexing 34 pages from job ffb232a4-c3b1-11eb-8f29-0242ac130004... 2021-06-03 02:12:38+0000 [INFO - mee] ...batch of 34 crawled pages with 5405 links prepared... 2021-06-03 02:12:43+0000 [INFO - mee] ...1672 unique pages indexed in traph in 4.96810507774s... 2021-06-03 02:12:43+0000 [INFO - mee] ...0 new WEs created in traph in 5.48362731934e-05s 2021-06-03 02:12:43+0000 [INFO - mee] Indexing 47 pages from job 8a2854a0-c3b1-11eb-8f29-0242ac130004... 2021-06-03 02:12:43+0000 [INFO - mee] ...batch of 47 crawled pages with 3248 links prepared... 2021-06-03 02:13:00+0000 [INFO - mee] ...550 unique pages indexed in traph in 16.9876480103s... 2021-06-03 02:13:00+0000 [INFO - mee] ...17 new WEs created in traph in 0.003741979599s 2021-06-03 02:13:00+0000 [INFO - mee] Indexing 8 pages from job ffaa5746-c3b1-11eb-8f29-0242ac130004... 2021-06-03 02:13:00+0000 [INFO - mee] ...batch of 8 crawled pages with 1608 links prepared... 2021-06-03 02:13:04+0000 [INFO - mee] ...8 unique pages indexed in traph in 3.87622094154s... 2021-06-03 02:13:04+0000 [INFO - mee] ...0 new WEs created in traph in 6.69956207275e-05s 2021-06-03 02:13:04+0000 [INFO - mee] Indexing 39 pages from job ffad9d7a-c3b1-11eb-8f29-0242ac130004... 2021-06-03 02:13:04+0000 [INFO - mee] ...batch of 39 crawled pages with 2452 links prepared... 2021-06-03 02:13:10+0000 [INFO - mee] ...366 unique pages indexed in traph in 5.78412985802s... 2021-06-03 02:13:10+0000 [INFO - mee] ...0 new WEs created in traph in 5.29289245605e-05s 2021-06-03 02:13:11+0000 [INFO - mee] Indexing 44 pages from job 8a2750aa-c3b1-11eb-8f29-0242ac130004... 2021-06-03 02:13:11+0000 [INFO - mee] ...batch of 44 crawled pages with 11512 links prepared... 2021-06-03 02:13:33+0000 [INFO - mee] ...648 unique pages indexed in traph in 21.237981081s... 2021-06-03 02:13:45+0000 [INFO - mee] ...3 new WEs created in traph in 12.8026208878s 2021-06-03 02:13:58+0000 [INFO - mee] Indexing 65 pages from job ffb232a4-c3b1-11eb-8f29-0242ac130004... 2021-06-03 02:13:58+0000 [INFO - mee] ...batch of 65 crawled pages with 11221 links prepared... 2021-06-03 02:14:05+0000 [INFO - mee] ...3984 unique pages indexed in traph in 7.40759086609s... 2021-06-03 02:14:05+0000 [INFO - mee] ...0 new WEs created in traph in 6.48498535156e-05s 2021-06-03 02:15:12+0000 [INFO - mee] Indexing 145 pages from job 8a2854a0-c3b1-11eb-8f29-0242ac130004... 2021-06-03 02:15:12+0000 [INFO - mee] ...batch of 145 crawled pages with 17822 links prepared... 2021-06-03 02:17:23+0000 [INFO - mee] ...1144 unique pages indexed in traph in 130.911110878s... 2021-06-03 02:19:17+0000 [INFO - mee] ...11 new WEs created in traph in 114.425863028s 2021-06-03 02:19:22+0000 [INFO - mee] Dropping cleared traph queued queries: 0 calls & 0 iterative calls 2021-06-03 02:52:10+0000 [INFO - mee] Stopping after 1967s of inactivity 2021-06-03 02:52:10+0000 [INFO - mee] Traph stopped 2021-06-03 02:52:12+0000 [INFO - mee] WARNING: Force killing residual process 70 2021-06-03 02:52:13+0000 [-] Stopping factory <hyphe_backend.traph.client.TraphClientFactory instance at 0x7f0154353820> 2021-06-03 02:52:13+0000 [INFO - mee] Traph process exited cleanly

boogheta commented 3 years ago

Hello, This is really strange. The line at 02:19:22 comes from a function that can only be ran by API calls explicitely asking the corpus to be destroyed and that should have logged more things before and after. So the only way this line could be ran is in a border case which should normally never happen where there are still pages to index in the queue, but the database says there are absolutely no crawl that was ran. Is it possible maybe that you modified the content of the mongo database manually? Is the jobs collection in your mongo database empty ?

edouardschuppert commented 3 years ago

I did not change anything manually in the mongo database. I don't have a jobs collection. Maybe you mean the queue collection? If so, it is not empty. Screen done with Robo 3T below. photo_2021-06-03 17 50 13

boogheta commented 3 years ago

Well one problem is that this jobs collection suddenly disappeared for some reason. Another one is that you should also have a database named hyphe which should contain a collection called "corpus" which looks also like missing. Considering the earlier logs of Hyphe it is not possible these two weren't there at the time, so something apparently happened at 2:19 on your machine that interfered with your mongo database. Isn't it possible that Robo3T was opened and you (or your cat, baby, ...) inadvertously pressed keys which resulted in deleting some things? I don't have any other explanation that something mysterious happened on your machine sorry :( Maybe we could find more hints from the mongo container logs at the same period of time?

edouardschuppert commented 3 years ago

Here is an extract of the logs from the mongo container logs at the same period of time. I have truncated to keep only the time when there seems to have been a problem.

2021-06-03T02:16:41.414+0000 I NETWORK [initandlisten] connection accepted from 185.220.100.243:1904 #425 (74 connections now open) 2021-06-03T02:16:41.534+0000 I NETWORK [initandlisten] connection accepted from 185.220.100.243:29072 #426 (75 connections now open) 2021-06-03T02:16:43.992+0000 I NETWORK [initandlisten] connection accepted from 185.220.100.243:15738 #427 (76 connections now open) 2021-06-03T02:16:46.137+0000 I NETWORK [initandlisten] connection accepted from 185.220.100.243:21260 #428 (77 connections now open) 2021-06-03T02:16:48.283+0000 I NETWORK [initandlisten] connection accepted from 185.220.100.243:7682 #429 (78 connections now open) 2021-06-03T02:16:50.434+0000 I NETWORK [initandlisten] connection accepted from 185.220.100.243:23864 #430 (79 connections now open) 2021-06-03T02:16:52.624+0000 I NETWORK [initandlisten] connection accepted from 185.220.100.243:24154 #431 (80 connections now open) 2021-06-03T02:16:54.069+0000 I COMMAND [conn76] insert hyphe_mee.queue ninserted:1 keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 2 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 14172999 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 14397ms 2021-06-03T02:16:54.069+0000 I COMMAND [conn8] command hyphe_mee.$cmd command: count { count: "queue", query: { _job: "4e61589ac3cf11ebb5910242ac130003" } } planSummary: COUNT_SCAN { _job: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:44 locks:{ Global: { acquireCount: { r: 2 } }, MMAPV1Journal: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { R: 1 }, acquireWaitCount: { R: 1 }, timeAcquiringMicros: { R: 14392076 } } } 14392ms 2021-06-03T02:16:54.071+0000 I COMMAND [conn426] command hyphe_mee.$cmd command: listCollections { listCollections: 1, cursor: {}, nameOnly: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:483 locks:{ Global: { acquireCount: { r: 2 } }, MMAPV1Journal: { acquireCount: { r: 2 } }, Database: { acquireCount: { R: 1 }, acquireWaitCount: { R: 1 }, timeAcquiringMicros: { R: 12169389 } } } 12170ms 2021-06-03T02:16:54.071+0000 I NETWORK [conn426] end connection 185.220.100.243:29072 (79 connections now open) 2021-06-03T02:16:54.071+0000 I COMMAND [conn430] dropDatabase hyphe_mee starting 2021-06-03T02:16:54.898+0000 I NETWORK [conn425] end connection 185.220.100.243:1904 (78 connections now open) 2021-06-03T02:17:35.053+0000 I STORAGE [DataFileSync] flushing mmaps took 25068ms for 16 files 2021-06-03T02:17:35.538+0000 I JOURNAL [conn430] journalCleanup... 2021-06-03T02:17:35.538+0000 I JOURNAL [conn430] removeJournalFiles 2021-06-03T02:17:45.275+0000 I NETWORK [conn422] end connection 172.19.0.1:43602 (77 connections now open) 2021-06-03T02:17:46.497+0000 I JOURNAL [conn430] journalCleanup... 2021-06-03T02:17:46.497+0000 I JOURNAL [conn430] removeJournalFiles 2021-06-03T02:18:05.149+0000 I COMMAND [conn430] dropDatabase hyphe_mee finished 2021-06-03T02:18:05.149+0000 I COMMAND [conn430] command hyphemee.$cmd command: dropDatabase { dropDatabase: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:60 locks:{ Global: { acquireCount: { r: 2, w: 1, W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 3534316 } }, MMAPV1Journal: { acquireCount: { w: 4 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 60 } }, Database: { acquireCount: { W: 1 } } } 74612ms 2021-06-03T02:18:05.149+0000 I COMMAND [conn429] dropDatabase local starting 2021-06-03T02:18:05.150+0000 I NETWORK [conn430] end connection 185.220.100.243:23864 (76 connections now open) 2021-06-03T02:18:07.830+0000 I JOURNAL [conn429] journalCleanup... 2021-06-03T02:18:07.830+0000 I JOURNAL [conn429] removeJournalFiles 2021-06-03T02:18:16.290+0000 I JOURNAL [conn429] journalCleanup... 2021-06-03T02:18:16.290+0000 I JOURNAL [conn429] removeJournalFiles 2021-06-03T02:18:19.944+0000 I COMMAND [conn429] dropDatabase local finished 2021-06-03T02:18:19.944+0000 I COMMAND [conn428] dropDatabase hyphe starting 2021-06-03T02:18:19.944+0000 I COMMAND [conn429] command local.$cmd command: dropDatabase { dropDatabase: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:56 locks:{ Global: { acquireCount: { r: 2, w: 1, W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 76758280 } }, MMAPV1Journal: { acquireCount: { w: 4 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 115 } }, Database: { acquireCount: { W: 1 } } } 91552ms 2021-06-03T02:18:19.944+0000 I NETWORK [conn429] end connection 185.220.100.243:7682 (75 connections now open) 2021-06-03T02:18:20.287+0000 I JOURNAL [conn428] journalCleanup... 2021-06-03T02:18:20.287+0000 I JOURNAL [conn428] removeJournalFiles 2021-06-03T02:18:20.303+0000 I JOURNAL [conn428] journalCleanup... 2021-06-03T02:18:20.303+0000 I JOURNAL [conn428] removeJournalFiles 2021-06-03T02:18:20.336+0000 I COMMAND [conn428] dropDatabase hyphe finished 2021-06-03T02:18:20.336+0000 I COMMAND [conn428] command hyphe.$cmd command: dropDatabase { dropDatabase: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:56 locks:{ Global: { acquireCount: { r: 2, w: 1, W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 93706969 } }, MMAPV1Journal: { acquireCount: { w: 4 } }, Database: { acquireCount: { W: 1 } } } 94099ms 2021-06-03T02:18:20.336+0000 I COMMAND [conn427] dropDatabase hyphe--test-corpus-- starting 2021-06-03T02:18:20.336+0000 I NETWORK [conn428] end connection 185.220.100.243:21260 (74 connections now open) 2021-06-03T02:18:20.337+0000 I JOURNAL [conn427] journalCleanup... 2021-06-03T02:18:20.337+0000 I JOURNAL [conn427] removeJournalFiles 2021-06-03T02:18:20.347+0000 I JOURNAL [conn427] journalCleanup... 2021-06-03T02:18:20.347+0000 I JOURNAL [conn427] removeJournalFiles 2021-06-03T02:18:20.374+0000 I COMMAND [conn427] dropDatabase hyphe--test-corpus-- finished 2021-06-03T02:18:20.375+0000 I COMMAND [conn427] command hyphe--test-corpus--.$cmd command: dropDatabase { dropDatabase: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:72 locks:{ Global: { acquireCount: { r: 2, w: 1, W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 95762435 } }, MMAPV1Journal: { acquireCount: { w: 4 } }, Database: { acquireCount: { W: 1 } } } 95800ms 2021-06-03T02:18:20.375+0000 I INDEX [conn431] allocating new ns file /data/db/README_TO_RECOVER_YOUR_DATA.ns, filling with zeroes... 2021-06-03T02:18:20.375+0000 I NETWORK [conn427] end connection 185.220.100.243:15738 (73 connections now open) 2021-06-03T02:18:20.375+0000 I INDEX [conn302] allocating new ns file /data/db/hyphe_mee.ns, filling with zeroes... 2021-06-03T02:18:51.245+0000 I STORAGE [FileAllocator] allocating new datafile /data/db/hyphe_mee.0, filling with zeroes... 2021-06-03T02:19:01.834+0000 I STORAGE [FileAllocator] done allocating datafile /data/db/hyphe_mee.0, size: 64MB, took 10.589 secs 2021-06-03T02:19:01.834+0000 I STORAGE [FileAllocator] allocating new datafile /data/db/READ__ME_TO_RECOVER_YOUR_DATA.0, filling with zeroes... 2021-06-03T02:19:01.835+0000 I STORAGE [conn302] MmapV1ExtentManager took 10 seconds to open: /data/db/hyphe_mee.0 2021-06-03T02:19:08.246+0000 I STORAGE [FileAllocator] done allocating datafile /data/db/README_TO_RECOVER_YOUR_DATA.0, size: 64MB, took 6.408 secs 2021-06-03T02:19:08.247+0000 I STORAGE [conn431] MmapV1ExtentManager took 17 seconds to open: /data/db/README_TO_RECOVER_YOUR_DATA.0 2021-06-03T02:19:17.499+0000 I WRITE [conn431] insert README_TO_RECOVER_YOUR_DATA.README query: { content: "All your data is a backed up. You must pay 0.02 BTC to 16jE6yZiZnX8enpnsdwUcxuV4NDY1NAUtL 48 hours for recover it. After 48 hours expiration we will l...", _id: ObjectId('60b83b947d0b9acf1ac1f266') } ninserted:1 keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 2, w: 2 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 87498797 } }, MMAPV1Journal: { acquireCount: { w: 8 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 171 } }, Database: { acquireCount: { w: 1, W: 1 } }, Collection: { acquireCount: { W: 1 } }, Metadata: { acquireCount: { W: 4 } } } 144623ms 2021-06-03T02:19:17.499+0000 I WRITE [conn9] update hyphe_mee.jobs query: { crawljob_id: "4e61589ac3cf11ebb5910242ac130003" } update: { $set: { nb_crawled_pages: 24479, nb_unindexed_pages: 124 } } keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 86304255 } }, MMAPV1Journal: { acquireCount: { w: 2 } }, Database: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 57124425 } }, Collection: { acquireCount: { W: 1 } } } 143429ms 2021-06-03T02:19:17.500+0000 I WRITE [conn12] insert hyphe_mee.webentities query: { _id: 34592, prefixes: [ "s:http|h:fr|h:cop1|h:www|", "s:http|h:fr|h:cop1|", "s:https|h:fr|h:cop1|h:www|", "s:https|h:fr|h:cop1|" ], status: "DISCOVERED", lastModificationDate: 1622686643096, creationDate: 1622686643096, name: "Cop1.fr", tags: {}, crawled: false, homepage: null, startpages: [] } ninserted:1 keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 2, w: 2 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 57277789 } }, MMAPV1Journal: { acquireCount: { w: 6 } }, Database: { acquireCount: { w: 1, W: 1 }, acquireWaitCount: { w: 1, W: 1 }, timeAcquiringMicros: { w: 57124355, W: 723 } }, Collection: { acquireCount: { W: 2 } }, Metadata: { acquireCount: { W: 2 } } } 114403ms 2021-06-03T02:19:17.508+0000 I COMMAND [conn431] command READ__ME_TO_RECOVER_YOUR_DATA.$cmd command: insert { insert: "README", ordered: true, documents: [ { content: "All your data is a backed up. You must pay 0.02 BTC to 16jE6yZiZnX8enpnsdwUcxuV4NDY1NAUtL 48 hours for recover it. After 48 hours expiration we will l...", _id: ObjectId('60b83b947d0b9acf1ac1f266') } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:40 locks:{ Global: { acquireCount: { r: 2, w: 2 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 87498797 } }, MMAPV1Journal: { acquireCount: { w: 8 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 171 } }, Database: { acquireCount: { w: 1, W: 1 } }, Collection: { acquireCount: { W: 1 } }, Metadata: { acquireCount: { W: 4 } } } 144632ms 2021-06-03T02:19:17.508+0000 I COMMAND [conn9] command hyphe_mee.$cmd command: update { update: "jobs", updates: [ { q: { crawljob_id: "4e61589ac3cf11ebb5910242ac130003" }, u: { $set: { nb_crawled_pages: 24479, nb_unindexed_pages: 124 } }, upsert: false, multi: true } ], writeConcern: {} } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 1, w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 86304255 } }, MMAPV1Journal: { acquireCount: { w: 2 } }, Database: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 57124425 } }, Collection: { acquireCount: { W: 1 } } } 143438ms 2021-06-03T02:19:17.508+0000 I NETWORK [conn431] end connection 185.220.100.243:24154 (72 connections now open) 2021-06-03T02:19:17.509+0000 I WRITE [conn11] insert hyphe_mee.stats query: { _id: ObjectId('60b83bec6459096cc0662549'), timestamp: 1622686700375, discovered: 34538, undecided: 0, in: 43, in_uncrawled: 0, in_untagged: 43, total: 34581, out: 0 } ninserted:1 keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, MMAPV1Journal: { acquireCount: { w: 6 } }, Database: { acquireCount: { w: 1, W: 1 }, acquireWaitCount: { w: 1, W: 1 }, timeAcquiringMicros: { w: 57123173, W: 9188 } }, Collection: { acquireCount: { W: 2 } }, Metadata: { acquireCount: { W: 2 } } } 57132ms 2021-06-03T02:19:17.509+0000 I COMMAND [conn11] command hyphe_mee.$cmd command: insert { insert: "stats", documents: [ { _id: ObjectId('60b83bec6459096cc0662549'), timestamp: 1622686700375, discovered: 34538, undecided: 0, in: 43, in_uncrawled: 0, in_untagged: 43, total: 34581, out: 0 } ], ordered: true, writeConcern: {} } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:40 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, MMAPV1Journal: { acquireCount: { w: 6 } }, Database: { acquireCount: { w: 1, W: 1 }, acquireWaitCount: { w: 1, W: 1 }, timeAcquiringMicros: { w: 57123173, W: 9188 } }, Collection: { acquireCount: { W: 2 } }, Metadata: { acquireCount: { W: 2 } } } 57132ms 2021-06-03T02:19:17.510+0000 I COMMAND [conn12] command hyphe_mee.$cmd command: insert { insert: "webentities", ordered: true, writeConcern: {}, documents: 11 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:40 locks:{ Global: { acquireCount: { r: 3, w: 3 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 57277789 } }, MMAPV1Journal: { acquireCount: { w: 18 } }, Database: { acquireCount: { w: 2, W: 1 }, acquireWaitCount: { w: 2, W: 1 }, timeAcquiringMicros: { w: 57125517, W: 723 } }, Collection: { acquireCount: { W: 3 } }, Metadata: { acquireCount: { W: 2 } } } 114412ms 2021-06-03T02:19:17.822+0000 I COMMAND [conn10] CMD: drop hyphe_mee.queue 2021-06-03T02:19:20.275+0000 I COMMAND [conn12] CMD: drop hyphe_mee.queue 2021-06-03T02:19:22.685+0000 I COMMAND [conn10] command hyphe_mee.$cmd command: drop { drop: "queue" } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:78 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 3 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 4862556 } }, Database: { acquireCount: { W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 293 } }, Metadata: { acquireCount: { W: 2 } } } 4863ms 2021-06-03T02:19:22.685+0000 I COMMAND [conn12] command hyphe_mee.$cmd command: drop { drop: "queue" } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:62 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 2 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 2408823 } }, Database: { acquireCount: { W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 855 } } } 2409ms 2021-06-03T02:20:03.899+0000 I WRITE [conn18] update hyphe.corpus query: { _id: "mee" } update: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 481, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 139, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686800279 } } keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 3620061 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 3620ms 2021-06-03T02:20:03.899+0000 I COMMAND [conn18] command hyphe.$cmd command: update { update: "corpus", updates: [ { q: { _id: "mee" }, u: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 481, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 139, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686800279 } }, upsert: false, multi: false } ], writeConcern: {} } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 3620061 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 3620ms 2021-06-03T02:20:20.414+0000 I STORAGE [DataFileSync] flushing mmaps took 10429ms for 4 files 2021-06-03T02:20:23.943+0000 I WRITE [conn22] update hyphe.corpus query: { _id: "mee" } update: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 547, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 205, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686820279 } } keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 3664072 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 3664ms 2021-06-03T02:20:24.052+0000 I COMMAND [conn22] command hyphe.$cmd command: update { update: "corpus", updates: [ { q: { _id: "mee" }, u: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 547, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 205, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686820279 } }, upsert: false, multi: false } ], writeConcern: {} } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 3664072 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 3772ms 2021-06-03T02:20:53.423+0000 I WRITE [conn26] update hyphe.corpus query: { _id: "mee" } update: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 656, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 314, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686850279 } } keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 3143181 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 3143ms 2021-06-03T02:20:53.423+0000 I COMMAND [conn26] command hyphe.$cmd command: update { update: "corpus", updates: [ { q: { _id: "mee" }, u: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 656, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 314, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686850279 } }, upsert: false, multi: false } ], writeConcern: {} } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 3143181 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 3143ms 2021-06-03T02:21:01.168+0000 I WRITE [conn25] update hyphe.corpus query: { _id: "mee" } update: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 667, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 325, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686860280 } } keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 887505 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 887ms 2021-06-03T02:21:01.168+0000 I COMMAND [conn25] command hyphe.$cmd command: update { update: "corpus", updates: [ { q: { _id: "mee" }, u: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 667, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 325, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686860280 } }, upsert: false, multi: false } ], writeConcern: {} } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 887505 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 887ms 2021-06-03T02:21:10.822+0000 I WRITE [conn29] update hyphe.corpus query: { _id: "mee" } update: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 725, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 383, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686870278 } } keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 543464 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 543ms 2021-06-03T02:21:10.823+0000 I COMMAND [conn29] command hyphe.$cmd command: update { update: "corpus", updates: [ { q: { _id: "mee" }, u: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 725, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 383, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686870278 } }, upsert: false, multi: false } ], writeConcern: {} } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 543464 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 543ms 2021-06-03T02:21:46.936+0000 I WRITE [conn6] update hyphe.corpus query: { _id: "mee" } update: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 839, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 497, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686900279 } } keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 6656508 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 6656ms 2021-06-03T02:21:46.936+0000 I COMMAND [conn6] command hyphe.$cmd command: update { update: "corpus", updates: [ { q: { _id: "mee" }, u: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 839, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 497, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686900279 } }, upsert: false, multi: false } ], writeConcern: {} } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 6656508 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 6656ms 2021-06-03T02:22:33.001+0000 I WRITE [conn11] update hyphe.corpus query: { _id: "mee" } update: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 1014, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 672, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686950278 } } keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 2722163 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 2722ms 2021-06-03T02:22:33.001+0000 I COMMAND [conn11] command hyphe.$cmd command: update { update: "corpus", updates: [ { q: { _id: "mee" }, u: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 1014, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 672, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686950278 } }, upsert: false, multi: false } ], writeConcern: {} } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 2722163 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 2722ms 2021-06-03T02:22:41.401+0000 I WRITE [conn13] update hyphe.corpus query: { _id: "mee" } update: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 1027, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 685, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686960278 } } keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 1122111 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 1122ms 2021-06-03T02:22:41.401+0000 I COMMAND [conn13] command hyphe.$cmd command: update { update: "corpus", updates: [ { q: { _id: "mee" }, u: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 1027, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 685, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686960278 } }, upsert: false, multi: false } ], writeConcern: {} } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 1122111 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 1122ms 2021-06-03T02:22:50.736+0000 I WRITE [conn14] update hyphe.corpus query: { _id: "mee" } update: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 1075, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 733, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686970280 } } keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 455764 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 455ms 2021-06-03T02:22:50.736+0000 I COMMAND [conn14] command hyphe.$cmd command: update { update: "corpus", updates: [ { q: { _id: "mee" }, u: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 1075, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 733, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686970280 } }, upsert: false, multi: false } ], writeConcern: {} } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 455764 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 455ms 2021-06-03T02:23:12.111+0000 I WRITE [conn16] update hyphe.corpus query: { _id: "mee" } update: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 1148, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 806, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686990279 } } keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 1831318 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 1831ms 2021-06-03T02:23:12.111+0000 I COMMAND [conn16] command hyphe.$cmd command: update { update: "corpus", updates: [ { q: { _id: "mee" }, u: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 1148, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 806, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622686990279 } }, upsert: false, multi: false } ], writeConcern: {} } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 1831318 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 1831ms 2021-06-03T02:23:23.873+0000 I WRITE [conn18] update hyphe.corpus query: { _id: "mee" } update: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 1190, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 848, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622687000280 } } keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 3592539 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 3592ms 2021-06-03T02:23:23.873+0000 I COMMAND [conn18] command hyphe.$cmd command: update { update: "corpus", updates: [ { q: { _id: "mee" }, u: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 1190, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 848, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622687000280 } }, upsert: false, multi: false } ], writeConcern: {} } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 3592539 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 3592ms 2021-06-03T02:23:24.074+0000 I STORAGE [DataFileSync] flushing mmaps took 14089ms for 4 files 2021-06-03T02:23:32.906+0000 I WRITE [conn19] update hyphe.corpus query: { _id: "mee" } update: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 1208, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 866, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622687010287 } } keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 2617495 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 2617ms 2021-06-03T02:23:32.906+0000 I COMMAND [conn19] command hyphe.$cmd command: update { update: "corpus", updates: [ { q: { _id: "mee" }, u: { $set: { recent_changes: true, webentities_in_uncrawled: 0, total_webentities: 11, last_links_loop: 1622686016.647607, last_index_loop: 1622686757530, total_links_found: 0, crawls_pending: 0, total_crawls: 0, total_pages: 0, webentities_out: 0, webentities_in: 0, webentities_discovered: 11, links_duration: 304.4950218200684, total_pages_crawled: 1208, webentities_in_untagged: 0, crawls_running: 5, total_pages_queued: 866, webentities_undecided: 0, options: { phantom: { autoretry: false, idle_timeout: 20, whitelist_domains: [], ajax_timeout: 15, timeout: 600 }, defaultCreationRule: "domain", follow_redirects: [ "fb.me", "l.facebook.com", "facebook.com/l.php", "www.facebook.com/l.php", "goo.gl", "feedproxy.google.com", "t.co", "lnkd.in", "youtu.be", "bit.ly", "bitly.com", "tinyurl.com", "buff.ly", "dlvr.it", "is.gd", "j.mp", "owl.li", "ow.ly", "po.st", "wp.me", "shar.es", "tmblr.co", "adec.co", "amn.st", "bddy.me", "crwd.fr", "disq.us", "ebx.sh", "ed.gr", "fal.cn", "flip.it", "frama.link", "fw.to", "gerd.fm", "go.shr.lc", "ht.ly", "hubs.ly", "ift.tt", "io.webhelp.com", "lc.cx", "loom.ly", "mon.actu.io", "msft.social", "mtr.cool", "non.li", "sco.lt", "soc.fm", "spr.ly", "swll.to", "trib.al", "twib.in", "u.afp.com", "urlz.fr", "wrld.bg", "xfru.it", "zpr.io" ], proxy: { host: "", port: 3128 }, defaultStartpagesMode: [ "homepage", "prefixes", "pages-5" ], max_depth: 3, keepalive: 1800 }, last_activity: 1622687010287 } }, upsert: false, multi: false } ], writeConcern: {} } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 2617495 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } 2617ms 2021-06-03T02:24:19.370+0000 I STORAGE [FileAllocator] allocating new datafile /data/db/hyphe_mee.1, filling with zeroes... 2021-06-03T02:24:19.380+0000 I STORAGE [FileAllocator] done allocating datafile /data/db/hyphe_mee.1, size: 128MB, took 0.009 secs 2021-06-03T02:36:56.645+0000 I STORAGE [FileAllocator] allocating new datafile /data/db/hyphe_mee.2, filling with zeroes... 2021-06-03T02:36:56.658+0000 I STORAGE [FileAllocator] done allocating datafile /data/db/hyphe_mee.2, size: 256MB, took 0.012 secs 2021-06-03T02:46:43.964+0000 I NETWORK [conn320] end connection 172.19.0.3:42212 (71 connections now open) 2021-06-03T02:46:43.964+0000 I NETWORK [conn319] end connection 172.19.0.3:42210 (70 connections now open) 2021-06-03T02:46:43.964+0000 I NETWORK [conn318] end connection 172.19.0.3:42176 (69 connections now open) 2021-06-03T02:46:43.966+0000 I NETWORK [conn317] end connection 172.19.0.3:42164 (68 connections now open) 2021-06-03T02:46:46.251+0000 I NETWORK [initandlisten] connection accepted from 172.19.0.3:33726 #432 (69 connections now open) 2021-06-03T02:46:47.027+0000 I NETWORK [initandlisten] connection accepted from 172.19.0.3:33728 #433 (70 connections now open) 2021-06-03T02:46:51.416+0000 I NETWORK [initandlisten] connection accepted from 172.19.0.3:33732 #434 (71 connections now open) 2021-06-03T02:46:51.417+0000 I NETWORK [initandlisten] connection accepted from 172.19.0.3:33734 #435 (72 connections now open) 2021-06-03T02:58:55.450+0000 I STORAGE [FileAllocator] allocating new datafile /data/db/hyphe_mee.3, filling with zeroes... 2021-06-03T02:58:55.471+0000 I STORAGE [FileAllocator] done allocating datafile /data/db/hyphe_mee.3, size: 512MB, took 0.02 secs

boogheta commented 3 years ago

Wow! Well, mystery solved but that is a first and you're quite unlucky :(

Apparently your machine exposes all of its ports (or at least 27017) publicly, so by adding ports: "27017:27017" to your config you did not only make your mongo accessible locally to Robo3T but also publicly on the internet. Then it looks like at 2:18 some ransomware that scans IP automatically to target public mongo databases found your machine and attacked it. The logs are quite explicit : it comes from IP 185.220.100.243, which successively deleted each one of the databases in the mongo then inserted another database named READ__ME_TO_RECOVER_YOUR_DATA asking for 0.02BTC to recover it (although logs do not let one think an actual copy was made before...): 2021-06-03T02:19:17.499+0000 I WRITE [conn431] insert READ__ME_TO_RECOVER_YOUR_DATA.README query: { content: "All your data is a backed up. You must pay 0.02 BTC to 16jE6yZiZnX8enpnsdwUcxuV4NDY1NAUtL 48 hours for recover it. After 48 hours expiration we will l...", _id: ObjectId('60b83b947d0b9acf1ac1f266') } ninserted:1 keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 2, w: 2 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 87498797 } }, MMAPV1Journal: { acquireCount: { w: 8 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 171 } }, Database: { acquireCount: { w: 1, W: 1 } }, Collection: { acquireCount: { W: 1 } }, Metadata: { acquireCount: { W: 4 } } } 144623ms

I'm very sorry for you but I guess except if you wanna try and pay about 600€ without any guarantee to get back anything, you should rather restart from scratch your corpus (and destroy/reinstall your hyphe probably) after securing port 27017 on your machine so that it is not exposed publicly anymore.

edouardschuppert commented 3 years ago

Well, that was pretty unexpected, and pretty silly of me. Anyway, I cleaned it up, and it seems to be working fine now. I really thank you for your time and insights on this issue!

boogheta commented 3 years ago

no problem, sorry you lost your first corpus, I hope Hyphe will be useful to you!