Closed laughingclouds closed 2 years ago
Alright, it seems we won't need to open the accordian. Once we find the div elements that has the chapter links we can input their list to this function
def get_hrefList(divList: list[WebElement]) -> list[str]:
hrefList = []
for divElement in divList:
aList: list[WebElement] = divElement.find_elements(By.TAG_NAME, "a")
hrefList.extend([a.get_attribute("href") for a in aList])
return hrefList
Also, we need to switch this to "Oldest". Because then the links will be saved with the current index, i.e., prologue/chapter 0 at index 0, chapter 1 at index 1 and so on. [Too much trouble, we can simply reverse the list]
Done https://github.com/laughingclouds/Scrapia-World/issues/4#issuecomment-1030679387
At the end, I had to open the accordians for the script to work. But yes, it's done. At least the part which creates the profile is done. There's more work to do with the profiler.
Commit where it's done 4c06df835ca5e424ae1bbcc2ab67e80f6a22c778.
Rather than going to the novel page and then searching for the chapter to click everytime the script is run, we can save the links to every chapter of the novel.
We can go to the required chapter using that link after logging in.
From then, we can simply click the "next" button and keep track of the
current_chapter
.For this, we need to first create a "profile" of the novel to scrape.
For we first need to open the accordian for every chapter
Which can be done by finding all accordian div elements
find using
for a
element
We might need to open the accordians as well
Each of these elements has hrefs to the chapters of the novel. Store them with the indexing.