Closed TriAttack238 closed 1 year ago
You can find issues getting data for chapters in the Benny-Scraper.BusinessLogic.Scrapers.Strategy.ScraperStrategy.cs in the FetchContentByAttribute()
method, that is where all Scrapers go to get things such as the Authors, genres, descriptions. In this case it would be called from the MangaKakalotStrategy.cs FetchNovelContentAsync()
method.
the issue is that the selectors to get all the chapter links to navigate to was returning null, which mean that things couldn't proceed properly. I checked the innerHtml of the documentNode and found that the element is no longer being loaded through http calls, which means I will need to make a change to either call the js that is hiding it or just use selenium like i do for mangas to load the page.
here is a quick video of the debug steps.
For now, please use https://mangakatana.com https://mangakatana.com/manga/undead-unluck.24191. If you get stuck seeing Using Selenium to get chapters data
just hit enter once
As for the location of both the Epub and PDF folder, it is stored by default in your Documents/BennyScrapedNovels/{Novel Name}. This can be found and edited in the Benny-Scraper.BusinessLogic project in the NovelProcessor.cs GetDocumentsFolder() method. I will go ahead and get the filepath to be something that the user can set on start.
Please reply if this help solve your problem. I will push a few changes up the will do a better job logging where things went wrong.
Thank you, I think it's working now! So was it an issue with how the program currently scrapes the page, due to mangakalot, or something else?
Regarding logging, it could be good to dump the appropriate logs in a human-readable format on a text document that can be preserved.
Thank you, I think it's working now! So was it an issue with how the program currently scrapes the page, due to mangakalot, or something else?
Regarding logging, it could be good to dump the appropriate logs in a human-readable format on a text document that can be preserved.
For the first, I would think that mangakalot made some changes to how they render things to the page. When you say it's working now, do you mean mangakalot is, or mangakatana?
All the logs are being written to appdata/roaming/bennyscraper/logs. That parent folder also contains the database as well if you wanted to view it in SQLiteStudio
Well, it started working, but after letting it run all the way through, I got a new error.
15:44:47 Error Error while getting chapters data. OpenQA.Selenium.WebDriverException: disconnected: Unable to receive message from renderer
(failed to check if window was closed: disconnected: not connected to DevTools)
(Session info: headless chrome=116.0.5845.140)
at OpenQA.Selenium.WebDriver.UnpackAndThrowOnError(Response errorResponse, String commandToExecute)
at OpenQA.Selenium.WebDriver.Execute(String driverCommandToExecute, Dictionary`2 parameters)
at OpenQA.Selenium.WebDriver.set_Url(String value)
at OpenQA.Selenium.Navigator.GoToUrl(String url)
at Benny_Scraper.BusinessLogic.Scrapers.Strategy.ScraperStrategy.GetChapterDataAsync(IWebDriver driver, String urls, String tempImageDirectory) in C:\Users\Sean Vo\Github_Repos_Default\Benny-Scraper\Benny-Scraper.BusinessLogic\Scrapers\Strategy\ScraperStrategy.cs:line 682
at Benny_Scraper.BusinessLogic.Scrapers.Strategy.ScraperStrategy.GetChaptersDataAsync(List`1 chapterUrls) in C:\Users\Sean Vo\Github_Repos_Default\Benny-Scraper\Benny-Scraper.BusinessLogic\Scrapers\Strategy\ScraperStrategy.cs:line 494
15:44:47 Error Exception when trying to process novel. System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\Users\Sean Vo\AppData\Local\Temp\2a3b33e4-4367-4c47-94f1-a46203a530b9'.
at System.IO.FileSystem.GetFindData(String fullPath, Boolean isDirectory, Boolean ignoreAccessDenied, WIN32_FIND_DATA& findData)
at System.IO.FileSystem.RemoveDirectory(String fullPath, Boolean recursive)
at Benny_Scraper.BusinessLogic.Scrapers.Strategy.ScraperStrategy.GetChaptersDataAsync(List`1 chapterUrls) in C:\Users\Sean Vo\Github_Repos_Default\Benny-Scraper\Benny-Scraper.BusinessLogic\Scrapers\Strategy\ScraperStrategy.cs:line 542
at Benny_Scraper.BusinessLogic.NovelProcessor.AddNewNovelAsync(Uri novelTableOfContentsUri, ScraperStrategy scraperStrategy) in C:\Users\Sean Vo\Github_Repos_Default\Benny-Scraper\Benny-Scraper.BusinessLogic\NovelProcessor.cs:line 91
at Benny_Scraper.BusinessLogic.NovelProcessor.ProcessNovelAsync(Uri novelTableOfContentsUri) in C:\Users\Sean Vo\Github_Repos_Default\Benny-Scraper\Benny-Scraper.BusinessLogic\NovelProcessor.cs:line 62
at Benny_Scraper.Program.RunAsync() in C:\Users\Sean Vo\Github_Repos_Default\Benny-Scraper\Benny-Scraper\Program.cs:line 105
This is harder to debug as I am not able to reproduce it, it's especially weird as there were no errors with Selenium 2 days ago when you originally opened this issue. I've made a few changes to how I dispose of the drivers. Would you be able to try again, please clean and rebuild as well, I am hoping it may be due to an outdated driver.
If the problem persists, please send me an email with your logs so I would be able to get a better idea at which step things occurred.
I will mark this task as closed, since it addressed the original issue. Feel free to open up a new issue if the problem still persists.
Hello, I am a user of a Windows 11 x86-64 machine. I cloned the repo for this project and then turned it into an executable as specified in the readme document. I then tried to have the program scrape this manga using this URL: https://mangakakalot.to/undead-unluck-7025
However, the extraction did not complete, and the program let out this stack trace:
As far as I can understand it, the scraper thinks that the chapter list is empty or something, but I'm not sure. Any tips to get this working?
P.S. When the scraper actually makes an epub or PDF, where does it go? Can I change the output format manually?