scooterpsu / Comixology_Ubooquity_2

38 stars 7 forks source link

Publisher html and css generation script/scraper #21

Closed CuddleBear92 closed 4 years ago

CuddleBear92 commented 4 years ago

Automate Publisher html and css generation with scripts.

Publisher list: https://www.comixology.com/browse-publisher Publisher Example: https://www.comixology.com/DC/comics-publisher/1-0

Generate Publisher HTML

Example Output

Example:

<link rel="stylesheet" type="text/css" href="[[FOLDER]]/folder.css">
<script src="../../theme/jquery-3.1.1.min.js"></script>
<script src="../../theme/theme.js"></script>
<div id="userinfo2"></div>
<div align="center"><img src="[[FOLDER]]/header.jpg" width="100%"></div>
<div class="imprintNav">
    <ul>
        <li><a class="active" href="/ubooquity/comics/7/">DC Comics</a></li>
        <li><a href="/ubooquity/comics/49192/">Vertigo</a></li>
    <li><a href="/ubooquity/comics/1333307/">MAD</a></li>
    <li><a href="/ubooquity/comics/7/">Minx</a></li>
    <li><a href="/ubooquity/comics/7/">Paradox Press</a></li>
    <li><a href="/ubooquity/comics/7/">Wildstorm</a></li>
    </ul>
</div>

Only the Imprint changes here, if no imprint then no

should be written at all to keep it clean. (imprint ID's can be ignored as they are unique to each Ubooquity Server)

Example of Imprint nav bar look: image

Generate Publisher CSS

  • [x] Pull background color and text/link colors and write to CSS file

Example: https://www.comixology.com/Aspen-Comics/comics-publisher/6-0

#group{
    background-color: #007cd1;
    color: white;
    padding-top: 0;
}

#publinks{
    margin-top: -20px;
    width: 160px;
}

.label{
    color: white;
}

image

Grab Images

  • [x] Grab Publisher Image (save to publisher folder as folder.jpg AND to theme/publisher/publishername.jpg, to be referenced in series from within the theme.)
  • [x] Grab Publisher Banner (save to publisher folder as header.jpg)

All in all, the most important parts are the images and the folders they are placed in

scooterpsu commented 4 years ago

Not sure what the goal with this issue is. At first I thought it was a guide, but if you're requesting a script to do this... no.

Scraping pages isn't always straight forward, and what you're suggesting has a lot of potential to grow into something needlessly complicated. If you accomplish it, I'll take a look at integrating, but I don't intend to do anything along these lines.

Oh, in your example html you're notably reusing old code, there's no reason to include the script tags. The Comixology2 theme js automatically adds itself to the base page.

CuddleBear92 commented 4 years ago

Yeah, i can understand much of the stuff i have is still from the old v1 days. Just catching up to the v2 stuff over here now.

Scraping it shouldnt be hard technically. Might to a python script to pull any publisher url in a set list. That can make sense.

Converting the system to use the json system as with the series does make more sense. But i think Personal flare of the publishers is still wanted. Get the header image in, good thumbnail logo images and use the background and text colors original comixology site to mirror the feel of it all.

But re-using code is all well and good as it does still work technically, just that some of the old stuff like path breadcrumbs and the publink height need to be set to 1px.

Half of the old examples from v1 can be deleted from what i gather but they still have their uses for sure, 100%

scooterpsu commented 4 years ago

It's not that the scraping of Comixology is hard, it's that you need to also match the imprints to the arbitrary Ubooquity IDs when making that bar. I personally lean towards the case of it being something that you shouldn't need to be doing as often as things like series pages, and so it makes sense to leave as a manual process.

CuddleBear92 commented 4 years ago

Well i would leave them empty or with just the Ubooquity ID of 1 and let the user actually adding it to edit it. editing those as you go is much less work than doing everything, the images... the css and so on.

But its kinda an issue if you have a large/huge collection with many publishers. You dont want to use hours or even days on just this small thing when it could be automated in part.

scooterpsu commented 4 years ago

Sure, I get why someone would want that kind of thing automated. It's just far beyond anything I intend for this project.

CuddleBear92 commented 4 years ago

My friend made the script! It scrapes all the publishers and all the series pages as a whole in python! Makes html files, css styles, jpg images and the series json aswell!

We have not fully tested it 100% on everything just yet. my run is still running and with the request timer it will take hours to just do a single run.

REMOVED ZIP UPLOAD CAUSE OF UPDATES

Requires Python3
Requires Requests, beautifulsoup4 and html5lib

Use --scrape-series to scrape series pages and images aswell as publishers.
Use -d/--destination to set destination

@scooterpsu

@cryzed https://gist.github.com/cryzed/3cd9d0f914a3581ef8262301e204fbd7

CuddleBear92 commented 4 years ago

Will be doing a full run and zipping it up as a release on my theme repo and start replacing the individual files tomorrow. https://github.com/CuddleBear92/Ubooquity-Themes/issues/5

A full publisher run went smoothly. Doing a Publisher+Series run now.

EDIT: Would love for you to clear up how this is loaded? does it just load the image in the publisher folder? No need for a publisher dump folder in the themes anymore correct?

image

scooterpsu commented 4 years ago

Yeah, that's the folder image in the publisher folder.

Specifically in that case it's looking to the folder in the parent link.

scooterpsu commented 4 years ago

I highly recommend using your browsers' debug tools (Ctrl + Shift + I in Chrome). You can see errors or in that case the active source of the page. It's super helpful for questions like these. Right-click Inspect is also very useful.