webrecorder / webrecorder-player

Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)
Apache License 2.0
426 stars 39 forks source link

"Indexing has stalled" #78

Open nar001 opened 5 years ago

nar001 commented 5 years ago

So I'm trying to load a WARC file, it goes to 100% and then tells me it stalled. Navigating to the interface says "Almost Done!" but never goes farther. A long time ago, I used to be able to browse it, but now it doesn't work and I'm not quite sure why. Thanks!

nienkedekker commented 4 years ago

I'm running into the same issue on the latest release on MacOS. It'll index and then stall, eating up CPU like crazy. This is the additional information available:

Screenshot 2019-10-11 22 47 21

http://localhost:54292 just lists a JSON file:

{"/live": {"modes": ["list_sources", "index", "resource"]}, "/live/postreq": {"modes": ["list_sources", "index", "resource"]}, "/extract": {"modes": ["list_sources", "index", "resource"]}, "/extract/postreq": {"modes": ["list_sources", "index", "resource"]}, "/replay": {"modes": ["list_sources", "index", "resource"]}, "/replay/postreq": {"modes": ["list_sources", "index", "resource"]}, "/replay-coll": {"modes": ["list_sources", "index", "resource"]}, "/replay-coll/postreq": {"modes": ["list_sources", "index", "resource"]}, "/patch": {"modes": ["list_sources", "index", "resource"]}, "/patch/postreq": {"modes": ["list_sources", "index", "resource"]}}

The size of WARC file I'm trying to open is 5,01 GB. Please let me know if you need any additional information :)

ikreymer commented 4 years ago

Please try the 1.8.0 release. We've made some improvements to large WARC indexing and should work much better.

alvar-freude commented 4 years ago

I have the same issue with (small) HAR files and the 1.8.0 release (MacOS).

The progress bar is at 100%.

Extra Debug Info is:

Created user local with the email test@localhost and the role: 'public-archivist'
ERROR PARSING: /path/to/file.har
'pages'
WARCSERVER_HOST=http://localhost:52971

skip {'name': 'Admin', 'description': 'Admin API'}
skip {'name': 'Stats', 'description': 'Stats API'}
skip {'name': 'Automation', 'description': 'Automation API'}
APP_HOST=http://localhost:52972

The page on http://localhost:52972 shows the message "Almost Done!" and a progress bar on 100%. It seems, that everything is finished but something else fails …

alvar-freude commented 4 years ago

I made some more tests. With HAR files from Safari developer toolbar there seems to be no problem. A simple website (like "hello world" without any other files) is OK and this Github Page here is also OK.

But a HAR file saved with the firefox developer toolbar has the problem described above. Even the real simple HAR fails.