masterT / bandcamp-scraper

A scraper for https://bandcamp.com
MIT License
194 stars 34 forks source link

Return license information of albums and tracks #34

Open abetusk opened 5 years ago

abetusk commented 5 years ago

I do not see any license information returned by getAlbumInfo and it would be nice to provide that information.

For example, running:

var bandcamp = require("bandcamp-scraper");
var albumUrl='https://bit-rot.bandcamp.com/album/twisted-pair';
console.log(">>>", albumUrl);
bandcamp.getAlbumInfo(albumUrl, function(error, albumInfo) {
  if (error) { console.log(error); }
  else { console.log(albumInfo); }
});

returns:

>>> https://bit-rot.bandcamp.com/album/twisted-pair
{ artist: 'bit rot',
  title: 'Twisted Pair',
  imageUrl: 'https://f4.bcbits.com/img/a1538849378_2.jpg',
  tracks:
   [ { name: 'Uplink',
       url: 'https://bit-rot.bandcamp.com/track/uplink',
       duration: '06:19' },
     { name: 'Driver',
       url: 'https://bit-rot.bandcamp.com/track/driver',
       duration: '04:34' },
     { name: 'Psychadelic Death Trip',
       url: 'https://bit-rot.bandcamp.com/track/psychadelic-death-trip',
       duration: '06:18' },
     { name: 'POST',
       url: 'https://bit-rot.bandcamp.com/track/post',
       duration: '02:22' } ],
  raw:
   { current:
      { purchase_url: null,
        release_date: '24 Jan 2018 00:00:00 GMT',
        new_desc_format: 1,
        selling_band_id: 1888597831,
        set_price: 7,
        killed: null,
        purchase_title: null,
        minimum_price_nonzero: 7,
        title: 'Twisted Pair',
        new_date: '24 Jan 2018 02:32:19 GMT',
        featured_track_id: 286527331,
        minimum_price: 0,
        is_set_price: null,
        upc: null,
        credits: 'https://celebornidril.bandcamp.com/',
        private: null,
        art_id: 1538849378,
        require_email: null,
        id: 1214178877,
        band_id: 1888597831,
        about: 'Collaborations between bit rot & Celebornidril',
        require_email_0: null,
        download_pref: 2,
        publish_date: '24 Jan 2018 03:19:17 GMT',
        audit: 0,
        type: 'album',
        download_desc_id: null,
        auto_repriced: null,
        artist: null,
        mod_date: '17 Sep 2018 17:50:13 GMT' },
     is_preorder: null,
     album_is_preorder: null,
     album_release_date: '24 Jan 2018 00:00:00 GMT',
     preorder_count: null,
     hasAudio: true,
     art_id: 1538849378,
     trackinfo: [ [Object], [Object], [Object], [Object] ],
     playing_from: 'album page',
     featured_track_id: 286527331,
     initial_track_num: null,
     packages: null,
     url: 'http://bit-rot.bandcamp.com/album/twisted-pair',
     defaultPrice: 7,
     freeDownloadPage:
      'https://bandcamp.com/download?id=1214178877&ts=1550427274.1409455241&tsig=2e8c8dec6b5ffd439741a5698ac690d4&type=album',
     FREE: 1,
     PAID: 2,
     artist: 'bit rot',
     item_type: 'album',
     id: 1214178877,
     last_subscription_item: null,
     has_discounts: null,
     is_bonus: null,
     play_cap_data: null,
     client_id_sig: null,
     is_purchased: null,
     items_purchased: null,
     is_private_stream: null,
     is_band_member: null,
     licensed_version_ids: null,
     package_associated_license_id: null,
     tralbum_collect_info: { show_collect: true, show_wishlist_tooltip: false } },
  url: 'https://bit-rot.bandcamp.com/album/twisted-pair' }

The album is put under a CC-BY-SA license but I don't see that reflected in the returned data. I do see a licensed_version_ids and package_associated_license_id but I'm not sure if that's relevant to the license the album is put under and they're both null in this case.

masterT commented 5 years ago

Sorry for the very long delay, I don't know exactly, the raw property is extracted from the variable TralbumData in a <script> tag.

I looks like each track has its own property license_type, they all have the value 8 for your example.

I saw there the license information is on the page, so it is possible to extract it and add it to the data set.

<h3 class="license-label">license</h3>
<div id="license" class="info license">
  <a class="cc-icons" href="http://creativecommons.org/licenses/by-sa/3.0/" target="_blank">
    <span class="attribution"></span>
    <span class="share-alike"></span>
  </a>
  <a href="http://creativecommons.org/licenses/by-sa/3.0/" target="_blank">some rights reserved</a>
</div>
abetusk commented 5 years ago

Wow, I completely missed that, thanks.

This most likely "solves" the issue as that's the core information so maybe closing this issue is appropriate.

It would be nice to have a mapping of license_type to what the actual license is but this can be a separate issue. Is this something you'd be willing to add? Do you have any thoughts on how to get a mapping or if it's stable?

masterT commented 5 years ago

I searched quickly, but I did not find any mapping. Maybe it would be "easier" to scrape it from the HTML? 🤔 What exactly would you like to have? name? version? URL?

masterT commented 5 years ago

Found this https://get.bandcamp.help/hc/en-us/articles/360013563834-How-do-I-change-the-license-on-my-music-

abetusk commented 5 years ago

Here is what I came up with for the license map:

var license_map = {
  "" : { "license_type": "unknown" },
  "0" : { "license_type": "unknown" },
  "1" : { "license_type": "copyright" },
  "2" : { "license_type":"by-nc-nd;3.0" },
  "3" : { "license_type":"by-nc-sa;3.0" },
  "4" : { "license_type":"by-nc;3.0" },
  "5" : { "license_type":"by-nd;3.0" },
  "6" : { "license_type":"by:3.0" },
  "7" : { },
  "8" : { "license_type":"by-sa:3.0" }
};

I (ahem) have some scraped data from Bandcamp and I don't see any reference to license type 7. Maybe Bandcamp reserved this license type to be sa;3.0 but since essentially no one uses that license and/or Bandcamp doesn't provide it as an option it doesn't show up.

Here are some bands to test to see the above license type map is correct:

1,copyright,https://00000000000000000000.bandcamp.com/track/a
2,by-nc-nd;3.0,https://000-deer.bandcamp.com/track/23-59-s-2
3,by-nc-sa;3.0,https://0099.bandcamp.com/track/wrap-around-yr-dreams
4,by-nc;3.0,https://00raikage.bandcamp.com/track/souless
5,by-nd;3.0,https://01000001lien.bandcamp.com/track/d-n-b-a-s-b-c
6,by;3.0,https://01101001-01100100-01101100.bandcamp.com/track/qus-paradigm
7,sa;3.0,?
8,by-sa;3.0,https://01110.bandcamp.com/track/ignorance

Note, the above list is in no way an endorsement of the bands or any statement about their quality.

All the links that I could find for the Creative Commons licenses from Bandcamp refer to version 3.0 (for example licenses/by-sa/3.0/).

Any chance on getting this license mapping folded into bandcamp-scraper? I'm happy to make a ticket, issue a pull request, etc. if that's something you're open to.

masterT commented 5 years ago

Maybe the type 7 is not used anymore or is planned to be used in the future, I don't know!

Any chance on getting this license mapping folded into bandcamp-scraper? I'm happy to make a ticket, issue a pull request, etc. if that's something you're open to.

Yes, it would be nice to add it to the scraper! 😄 You can open an issue and create a pull request!