iheanyi / bandcamp-dl

Simple python script to download Bandcamp albums
The Unlicense
946 stars 107 forks source link

TypeError: 'NoneType' object is not subscriptable #228

Open auntchilada opened 1 month ago

auntchilada commented 1 month ago

Describe the bug

bandcamp-dl v0.0.15; tool throws an error early in the json library?

TypeError: 'NoneType' object is not subscriptable

To Reproduce

Command to reproduce the behavior:

bandcamp-dl --ascii-only --base-dir=/Volumes/banana/mediaLib/muzak/zincoming/bandcamp.dl.d --space-char=_ https://thethe.bandcamp.com/album/see-without-being-seen

URL or List of URLs HERE

  https://thethe.bandcamp.com/album/see-without-being-seen

Expected behavior

normal behavior of the tool to download the album without issue

Logs Most if not always you will get some kind of output explaining the issue, post it:


bandcamp-dl --ascii-only --base-dir=/Volumes/banana/mediaLib/muzak/zincoming/bandcamp.dl.d --space-char=_ https://thethe.bandcamp.com/album/see-without-being-seen

Traceback (most recent call last):
  File "/usr/local/bin/bandcamp-dl", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/Cellar/bandcamp-dl/0.0.15_3/libexec/lib/python3.12/site-packages/bandcamp_dl/__main__.py", line 98, in main
    album_list.append(bandcamp.parse(url, not arguments['--no-art'], arguments['--embed-lyrics'],
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/bandcamp-dl/0.0.15_3/libexec/lib/python3.12/site-packages/bandcamp_dl/bandcamp.py", line 42, in parse
    bandcamp_json = BandcampJSON(self.soup, debugging).generate()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/bandcamp-dl/0.0.15_3/libexec/lib/python3.12/site-packages/bandcamp_dl/bandcampjson.py", line 16, in generate
    self.get_pagedata()
  File "/usr/local/Cellar/bandcamp-dl/0.0.15_3/libexec/lib/python3.12/site-packages/bandcamp_dl/bandcampjson.py", line 22, in get_pagedata
    pagedata = self.body.find('div', {'id': 'pagedata'})['data-blob']
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable

``
If possible after running the command with the --debug option.

  no difference

**Desktop (please complete the following information):**

  ProductName:      macOS
  ProductVersion:       14.5
  BuildVersion:     23F79

  Python 3.12.4 (main, Jun  6 2024, 18:26:44) [Clang 15.0.0 (clang-1500.3.9.4)]

**Additional context**
mrgooge commented 1 month ago

I'm having the same issues, just started recently in the past 24 hours... For reference here's my error:

Traceback (most recent call last): File "/opt/homebrew/bin/bandcamp-dl", line 8, in sys.exit(main()) ^^^^^^ File "/opt/homebrew/Cellar/bandcamp-dl/0.0.15_3/libexec/lib/python3.12/site-packages/bandcamp_dl/main.py", line 98, in main album_list.append(bandcamp.parse(url, not arguments['--no-art'], arguments['--embed-lyrics'], ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/bandcamp-dl/0.0.15_3/libexec/lib/python3.12/site-packages/bandcamp_dl/bandcamp.py", line 42, in parse bandcamp_json = BandcampJSON(self.soup, debugging).generate() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/bandcamp-dl/0.0.15_3/libexec/lib/python3.12/site-packages/bandcamp_dl/bandcampjson.py", line 16, in generate self.get_pagedata() File "/opt/homebrew/Cellar/bandcamp-dl/0.0.15_3/libexec/lib/python3.12/site-packages/bandcamp_dl/bandcampjson.py", line 22, in get_pagedata pagedata = self.body.find('div', {'id': 'pagedata'})['data-blob']


TypeError: 'NoneType' object is not subscriptable
scim-ry commented 1 month ago

I'm having the same issues as well. I believe the bandcamp page structure has changed.

Traceback (most recent call last): File "/usr/local/bin/bandcamp-dl", line 8, in sys.exit(main()) ^^^^^^ File "/usr/local/Cellar/bandcamp-dl/0.0.15_3/libexec/lib/python3.12/site-packages/bandcamp_dl/main.py", line 98, in main album_list.append(bandcamp.parse(url, not arguments['--no-art'], arguments['--embed-lyrics'], ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/bandcamp-dl/0.0.15_3/libexec/lib/python3.12/site-packages/bandcamp_dl/bandcamp.py", line 42, in parse bandcamp_json = BandcampJSON(self.soup, debugging).generate() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/bandcamp-dl/0.0.15_3/libexec/lib/python3.12/site-packages/bandcamp_dl/bandcampjson.py", line 16, in generate self.get_pagedata() File "/usr/local/Cellar/bandcamp-dl/0.0.15_3/libexec/lib/python3.12/site-packages/bandcamp_dl/bandcampjson.py", line 22, in get_pagedata pagedata = self.body.find('div', {'id': 'pagedata'})['data-blob']


TypeError: 'NoneType' object is not subscriptable
encratite commented 1 month ago

I've had the same issue for at least 3 days now. I guess there's no fix yet?

scim-ry commented 1 month ago

I might fork and try to fix on Monday if no one else gets to it

Evolution0 commented 1 month ago

Getting 403 on just trying to fetch an album/track page is not a good sign, it means the request itself is fine but the server is now refusing it, this happened the first time around when Bandcamp started requiring proper headers; when that happened all I had to do was add one which is just a simple bandcamp-dl/VERSION (GITHUB_URL).

This does not seem to be enough now, even spoofing user-agent: UQC7xnh

If this was just a case of adjusting the parsing for a change in HTML/JS it would be simple but as it stands its not even possible to retrieve pages, currently looking for changes in header requirements, I don't believe its something like needing to load JS because I can block scripts entirely and still load pages fine.

Update 1: Even fully filling out the header to match what the browser uses, generating a referrer + authority for each request, as well as adding in cookies does not help. There is a small bit in the session of the cookie that changes but considering I can disable cookies entirely in my browser and still load pages fine I highly doubt that is it.

christatedavies commented 1 month ago

Could this be some sort of cloudflare protection? I encountered the same when I was scraping data from a sports site once. I had to rewrite my script to use Puppeteer. I will see if I can get that working here.

christatedavies commented 1 month ago

If anyone can work out where to change the code (I guessed it was around line 164 in bandcampdownloader.py)

r = self.session.get(track['url'], headers=self.headers, stream=True)

I couldn't get it working on my dev box (I'm at work, not near my linux box I use for downloading)

Anyway, I attach a JS file which you can use for headless downloading. If someone can work out how to shoehorn it into the code. You simply pass the URL as the first param and the file to output to as the second param

/usr/local/lib/nodejs/bin/node download_file.js https://someone.bandcamp.com/album /mnt/somedisk/albums

I'm not allowed to upload that file, so here is the code:

download_file.js

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
const fs = require('fs');

puppeteer.use(StealthPlugin());

const url = process.argv[2];
const outputFile = process.argv[3];

async function downloadPage() {
    const browser = await puppeteer.launch({ headless: "new" });
    const page = await browser.newPage();
    await page.goto(url);
    await page.waitForTimeout(2000);

    const content = await page.content();

    fs.writeFileSync(outputFile, content);

    await browser.close();
}

downloadPage().catch(err => {
    console.error(err);
    process.exit(1); // Exit with an error code if something goes wrong
});

Node Installation

You do have to have node installed

sudo apt update
sudo apt install nodejs npm -y
npm install puppeteer

Sorry I couldn't do this myself, and I don't know if it will help. Just a thought.

auntchilada commented 1 month ago

fyi-

i was able to succeed using the rust cargo bandrip tool here:

https://github.com/stannls/bandrip

so, it's not completely fubar?

ontera commented 1 month ago

fyi-

i was able to succeed using the rust cargo bandrip tool here:

https://github.com/stannls/bandrip

so, it's not completely fubar?

Thank you. I'm a C & C++ guy (firmware developer), I've heard of Rust but not used it. So in case there is anyone else here who doesn't know how to build a Rust program (or at least this specific program -- bandrip) -- it's pretty simple (just took me a couple minutes of searching). Most C/C++ programs are either build directly from the command line, from a Makefile, or using Cmake... Rust is a little different.

I'm on a Mac, so I just installed Rust (and the Cargo compiler / build system I guess) using homebrew. If you're on Linux or Windows, "just install Rust", whatever that means on your platform.

Then after cloning the repo, just change into the repo's root directory and type "cargo build" -- it figures out all the dependencies and packages automagically. Just let it run (I was pleasantly surprised I didn't get all sorts of weird version issues or dependency problems, it just worked fine out of the box).

Finally, the executable is placed in the directory "target/debug", executable name is "bandrip". One the binary is built you can move it or copy it wherever you wish.

Hope that helps somebody.

dertuxmalwieder commented 1 month ago

Hope that helps somebody.

Not me ;-) but indeed, the accessibility of Cargo is what made me curious about Rust. C and C++ have Conan, but integrating that with your usual build process can be cumbersome.

When I wrote yaydl, it was incredibly time-saving to have Cargo manage everything for me. But the borrow checker needs quite some time to understand.

ontera commented 1 month ago

Thank you -- now I must check out yaydl! (And thanks for creating it). I get into a rut where I keep using the same thing for the same purpose, even when it becomes crufty, instead of seeing if there is maybe something smaller / faster / better for the purpose.

For sure I will look into Rust when I get the chance. A lot of the firmware I work on is "high consequence" and for that, it seems that Rust has its advantages. (But just like above, through a combination of vetted libraries, techniques like RAII & features like constexpr, I keep finding ways to use the caveman stuff to get the job done instead of learning something new and potentially better)

dertuxmalwieder commented 1 month ago

Thank you -- now I must check out yaydl!

Please do!

It was my first approach at Rust and it was rather good as a start. :-)

christatedavies commented 1 month ago

I have also used this Rust program, but its not anywhere near as good as this iheanyi/bandcamp-dl package as it for one doesn't get the artwork and 2, isn't quite as easy to configure.

It possibly worth if you cannot wait for this app to be "fixed" but I'd rather wait, IMO

That said, thanks for the headsup

kjake commented 1 week ago

This is a bug related to https://github.com/urllib3/urllib3/issues/3439. If I inject an older release of urllib3 into my venv, all is well again. pipx inject bandcamp-downloader urllib3==1.26.19 --force

dertuxmalwieder commented 1 week ago

Ah, the joys of having runtime dependencies.

pquentin commented 1 week ago

Hello! urllib3 developer here. Bandcamp.com weirdly requires the TLS cipher suites that are used in urllib3 1.26.x, even if it does not use them. Here's the fix to continue using urllib3 2.x: https://github.com/urllib3/urllib3/issues/3439#issuecomment-2306400349

mrgooge commented 1 week ago

So for the laymen... To get bandcamp-dl working on my Mac again, do I try and update a version of urllib3 with the older codex with Python, Is that possible? Sorry if this is a dumb question...

kjake commented 1 week ago

@mrgooge in my testing, this is fixed in my PR: https://github.com/iheanyi/bandcamp-dl/pull/234

mrgooge commented 1 week ago

@kjake thanks for the information. How do I get your PR? I'm sorry... see below... I was able to find and build your fork. All good... Thank you!