seanap / Audible.com-Search-by-Album

Mp3tag Web Sources Scripts
84 stars 17 forks source link

Scrape audible author ID #27

Open buswedg opened 1 year ago

buswedg commented 1 year ago

Would it be possible/ worth capturing the audible author identifier? I don't see it in the current query scope at the title level.

For context -- I'm working towards a folder structure where the first level folders include the first author (full) name, but want to include an id alongside names to keep folders unique.

Will admit, I'm not overly familiar with Audibles metadata. But it does look like they have a 10 digit string for all authors in their db.

For example, I see Michelle Obama has an id of B07B436TLF -- https://www.audible.com/author/Michelle-Obama/B07B436TLF

seanap commented 1 year ago

Interesting, I did some digging in a few books page source and it looks like there is a datalayer section at the very bottom which has Author ID. I only checked two books and it was there in both, but I have no idea how consistent it is.

This shouldn't be hard to scrape. What ID3 tag should be put this ID in? I see a couple options; WWWARTIST, MUSICBRAINZ_ALBUMARTISTID, or a custom tag (AudibleAlbumArtistID, or AAAI). https://docs.mp3tag.de/mapping-table/

I hesitate to use the Mbz tag since the ID is from Audible not Mbz. The WWW tag is a good option. The custom tag would have the best description but no other program would know to read it (does that even matter?). Maybe we could consult with the Audiobookshelf team, but I don't know if having this ID tag would even be beneficial for them.

I'm also curious, how do you plan on handling books with multiple authors? Authors like J.N. Chaney frequently have multiple authors, and sometimes his name is listed first, sometimes second or third. Do we just pull the first author listed?

seanap commented 1 year ago

I made the executive decision to put the ID in a custom tag called "AUDIBLE_ALBUMARTISTID" to keep it consistent with the Mbz tag (still up for debate, let me know if anyone has a better idea). I've only tested this on a few books, but it seems to work pretty good.

Please test this on a variety of books and report back any issues. Then once we're happy that it's good enough I will merge with the main script.

Download the new .src script here https://github.com/seanap/Audible.com-Search-by-Album/blob/master/Audible.com%23Search%20by%20Album%20-%20BETA.src

seanap commented 1 year ago

Here's a format string that does what you're looking for: Z:\temp\TEST\audiobooks\%albumartist%[ '['%audible_albumartistid%']']\%series%\%year% - %album% [ '['%series% %series-part%']']\%album% (%year%)[ '['%series% %series-part%']']$ifgreater(%_total_files%,1, - pt$num(%track%,2),)

buswedg commented 1 year ago

Amazing -- thanks for the quick response. I'll give this a shot tomorrow.

buswedg commented 1 year ago

Here's a format string that does what you're looking for: Z:\temp\TEST\audiobooks\%albumartist%[ '['%audible_albumartistid%']']\%series%\%year% - %album% [ '['%series% %series-part%']']\%album% (%year%)[ '['%series% %series-part%']']$ifgreater(%_total_files%,1, - pt$num(%track%,2),)

%albumartist% may include more than one author however? I thought I saw some instances on that earlier today when I was playing with this. Is the albumartistID for only the first author, delimited by a comma?

seanap commented 1 year ago

This will only grab the ID of whichever author Audible lists first. A folder would look like /Author1, Author2 [IDof1]/...

buswedg commented 1 year ago

ok, I just took a quick look at some page sources. It looks like the authors is just a list of dics. It should have both the first author name and their ID in the first pair. Which is where you're pulling the author id from anyway. I'll make a new custom field similar to AUDIBLE_PRIMARYARTIST and maybe rename your custom id field to AUDIBLE_PRIMARYARTISTID.

On a separate note, I started taking a look at your beets.io fork this morning, as I'll need an automated solution here. But have some suggestions on search priority using ASIN (if already available in tag or filename). I think this will also improve results. But I'll make a separate issue there in time.

buswedg commented 1 year ago

You might want to update to something like the below to pull both the first author's name and id. I tested on a bunch of books this morning, and all looks fine.

findline "product:[{"
findinline "{\"fullName\":\"" 1 1
outputto "AUDIBLE_FIRSTARTIST"
sayuntil "\""
findinline "\"id\":\""
outputto "AUDIBLE_FIRSTARTISTID"
sayuntil "\""

I'd say scraping the api.audnex.us endpoint would be the more sustainable solution however. Same as the beets audible plugin. No doubt, it'll remain more stable than the source of audibles audiobook summary pages. And I see that API also includes first author asin as part of their spec.