macr0dev / Audiobooks.bundle

Plex metadata scraper for Audiobooks
600 stars 65 forks source link

No Search results from Audible.com #22

Closed macr0dev closed 4 years ago

macr0dev commented 6 years ago

Looks like audible changed the code on their search results page for just audible.com. Going to have to rework the parsing for that page to be able to return results.

Other language sites (the few I checked) don't seem to be affected.

macr0dev commented 6 years ago

found the right divs to break down the new results. working through the different items to scrap for scoring.

macr0dev commented 6 years ago

code changes have been made to account for the new search and results pages. Looks like this is more of the new audible.com rollout.

@dethrophes - heads up on this one.

I'm sure there is a more elegant solution to some of these lines of code, but I slapped enough together to get audible.com working again without blowing up the international support. Hopefully they'll get finished with their updates and just roll this new site out everywhere so things can be more standardized in the code.

Also, they didn't just change the layout of the search results page, they changed the url variables. Only one that looks to affect things directly was the book title. went from 'searchTitle' to just 'title'. After the holidays I may sit down and clean this up a little bit more.

dethrophes commented 6 years ago

I'm still going through this but one point is that, due to the german site I was already using 2 different strings for the search results and description page, for the release date. That is why there is rel_date and rel_date2.

dethrophes commented 6 years ago

From what I can see amazon.com still uses searchTitle, not just title. e.g. https://www.audible.com/search?advsearchKeywords=&searchTitle=The+Charmed+Return&searchAuthor=&searchNarrator=&searchProvider=&field_subjectbin=&field_content_type-bin=&field_format-bin=&field_publication_date=&field_runtime=&field_language=9178177011&x=0&y=0

macr0dev commented 6 years ago

The most recent update didn't change anything on the audible.com (US) descriptions page. Not since the changes in late October. So I didn't even look at that area.

The link you posted provided 350,000 results for me. But if I change the 'searchTitle=' to just 'title=' it only returns one.

dethrophes commented 6 years ago

That is odd, The link I provided only gives one result to me....

Also the other changes you made don't seem to work as intended, it looks to be giving me the suggestions instead of the search results...

can you post a url from the advanced search? I'm wondering if this is somehow geo/agent/etc.. related,

jmeosbn commented 6 years ago

When I search .com site with this change, plex returns an author only search. Copying the link from the plex log into curl returns a page with:

<link rel="canonical" href="https://www.audible.com/search?searchAuthor=Dan+Simmons" />

But, pasting (*) into firefox gives expected title + author search.

(*) edit, this link: "https://www.audible.com/search?title=Ilium&searchAuthor=Dan+Simmons&x=41&ipRedirectOverride=true"

Changing intl_sites -> en -> urltitle' back to 'searchTitle=' is req to give same expected results in plex.

So.... I think searchTitle is needed when using .com from outside of US due to parsing of url between sites?

jmeosbn commented 6 years ago

P.S. Using a non-US but en site always uses a .com address due to fix I submitted in #23

macr0dev commented 6 years ago

https://www.audible.com/search?keywords=&title=Infinite&author_author=Jeremy+Robinson&narrator=&publisher=

This url was created using the advanced search page on audible.com from with in the US.

It gave me one result. It very well could be geo related. Which makes things infinitely more fun. :)

If that turns out to be the case, we may have to introduce an option for "I'm outsite the US but want to use Audible.com to make the search url different. What I can say for sure is that prior to the changes mine was returning zero results from audible.com from within the US while the other sites were still working.

macr0dev commented 6 years ago

@jmeosbn - if you want to use the audible UK site but your library language is english,, just select the "Manually Select Audible Site (if unchecked, library language is used to determine the best site" and choose the site you want. That's one of the reasons that option was added.

jmeosbn commented 6 years ago

Hi, yes, I have tried that before but many books are not published on that site. It'd be nice if it would fall back to US after UK though as there are one or two on the UK site not on the US!

(that's when I noticed the need for fix in #23 - but I have ended up using US site anyway..)

macr0dev commented 6 years ago

@jmeosbn Let's discuss this in another issue thread please. Feel free to start one and we can discuss what you're trying to do.

I'm trying to use this one to work out the changes to the search page and results. Thanks.

macr0dev commented 6 years ago

OK. I just did some tests using a web proxy dumping me out in the Netherlands, and they definitely appear to be geo-locking the new pages. I got the old ones visiting audible.com from the Netherlands. This makes things more interesting....

jmeosbn commented 6 years ago

Hi, yup - that's why I linked to #23 😉

Running the code previous to your last (this) commit on my Raspberry Pi results in proper search results - just like pasting link from log into firefox - but plex on mac is no go as is! So there is some different handling even between the same code running on different platform (Rasbian vs. Darwin vs. Firefox!!)

macr0dev commented 6 years ago

OK. Me last comment establishes that running the current code from outside the US on audible.com results in a non-working solution. Running the previous code on a machine INSIDE the US results in a non-working situation. This is due to recent changes in the audible site that I'm still trying to work out.

As for what happens on different platforms? Who knows. I develop everything using whatever current plex beta is available on a Windows 10 box (because it's my main PC and convenient for testing) and then do my final tests on my production ubuntu Plex server before uploading to github. So I'm testing on two different platforms.

If you're seeing different results within from the same Internet connection, I have to ask if you're using some sort of VPN or proxy on one of those boxes?

jmeosbn commented 6 years ago

No proxy that I've set up and both connect through same router - I have no idea how caching may be affecting things.

On the Mac I have a script that scrubs the library folder and caches etc., but on the Pi it's part of my main Plex server and is also sloooooow to manually delete a library, even one with only a few books.

Though I had manually deleted the Caches folders on the Pi so...

If you have any tips for speedily testing these changes I'd be happy to try them as I'm sure there's a better way than what I'm doing right now!

macr0dev commented 6 years ago

Just do a manual 'fix match' and watch the logs. You can see the url and the results that it comes up with. It will also indicate whether it's pulling from the cache or not. The 'fix match' doesn't include the author in the search, but it will tell you whether or not the new url is giving you the correct results or not.

Then you don't have to delete the cache or the library. If you want to do a full test, just move a book from your library path, update files, empty trash, then put it back and scan again. Then you don't have to delete and start over. Just removing one book.

jmeosbn commented 6 years ago

Ok, that's pretty much what I've been doing - as well as occasionally starting fresh on Mac with 6 or so differently styled books (Plex doesn't seem as reliable as on the Pi - doesn't always load metadata despite listing it in the log?!).

jmeosbn commented 6 years ago

Just checked .co.uk and searchTitle is required there also.

edit: date on UK site is still ''Release Date" (capital 'D') also.

Looking at the code, it seems like the localisation of url params should be moved from intl_sites into sites_langs with the former just being used for the initial site auto-lookup based on library lang.

This would add a bit of duplication for UK and AUS but that seems to be required here.

jmeosbn commented 6 years ago

Not sure if audible made a change but with my latest commits to #23 .com with title now works. 😆

jmeosbn commented 6 years ago

Spoke to soon - seemed to work withtitle but was actually just an artist search still (that was well filtered from other changes).

However, I've worked out why I have less less on my Linux install wrt getting the library to update: whereas on Mac I can move media out of the parent folder to trigger an update (moving the parent folder itself is ignored), on the Pi I have to add (or leave) a file behind as a completely empty folder does does trigger a scan. That said, I've scripted it via the scanner binary now..

jmeosbn commented 6 years ago

Okay, when using keyword searching for the title (as in #28) this isn't an issue. Otherwise I think some logic needs to select between title/searchTitle based on being located in the US or not. 😲

jmeosbn commented 6 years ago

Audible.com now seems to be accepting title= from here (UK)

edit: during the last hour .com search results have - for me - switched from:

and now:

this seems to happen every so often and I assume it's an issue with different servers running different code...

jmeosbn commented 6 years ago

Just a note that I'm starting to see the new json based pages on the other sites now..

macr0dev commented 4 years ago

Cleaning up old issues. Closing.