mandiberg / printwikipedia

30 stars 11 forks source link

Constrain volumes to 700 pages #14

Closed mandiberg closed 7 years ago

mandiberg commented 8 years ago

During the run we produced about 125 volumes that are bigger than 700, and 15 of them are over 800 pages. These need to be constrained at 700, as Lulu.com does not allow for more than 740 pages.

Uploading the ones between 700-740 throws an error, as the dimensions of the spine and the width of the book are incompatible. Over 740 and it throws an error because the spine is just too big. They don't crash the process but they are skipped over... kind of holding it up as it gets to stage 7 and nothing happens and it has to fail 4 times before going to next.

The way we wrote the end of volumes did not account for REALLY long entries. I simply cut off the last entry on page 700 and let it complete. This was simple and entirely ineffective, as in some cases the last entry was over 250 pages!)

The code needs to be rewritten to split the last entry at the end of the page, save the remaining contencts AND the article title, and start the next book with these.

CarstenG commented 8 years ago

Did I understand correctly, you found an article, which takes over 250 pages of a volume? Which article is so big? Can you please give the link to the WP?

hachacha commented 8 years ago

@CarstenG, that is correct. These entries are so long because they contain very large tables. When we print any text onto the pdf it is broken up into 3 columns regardless of the nature of the incoming text. Tables end up being smushed to fit (and sometimes they don't really) into the width of one of the columns on page. If it is a rather large table with many table columns, only one character per table column per page line will be printed, which would then take up a lot of page space for very little information.

We have not spent much time finding a way to print tables in such a way that would make them more readable nor have we discussed what would be the best design for the tables' layout in printwikipedia.

This can be dealt with another time because it still works well enough. The issue at hand is that the current program writes until it reaches the end of the wiki entry but there should be something to stop it once it reaches 740 pages.

This is ended up as the last entry in a book that reached 880 pages: Volume 4369: List of Top 10 Singles in 2010 (Ireland) --- List of tornadoes in the April 25–28, 2011 tornado outbreak https://en.wikipedia.org/wiki/List_of_tornadoes_in_the_2011_Super_Outbreak

CarstenG commented 8 years ago

Wow, this is really a huge article :) Maybe it is possible to switch from 3 to 1 column before the table and switch back to 3 columns after the table?

Is there also a possibility to tell printwikipedia: "Please do only print article »Foobar«"? This would be good for testing purposes.

mandiberg commented 8 years ago

These are good suggestions.

I like the suggestion to switch to one column for tables. We are currently writing code to handle tables separately, so in theory we could switch then. iText is quite picky, so we'll see. (Do you work with iText?)

We do have an informal way of getting it to only print the volume we are looking for. We should make sure that is documented. But not actually a way to just print one article. We haven't needed that specifically, as most of the problems we are trying to solve repeat across entries.

M

Written on the go.

On Jan 13, 2016, at 3:11 PM, Carsten Gerlach notifications@github.com wrote:

Wow, this is really a huge article :) Maybe it is possible to switch from 3 to 1 column before the table and switch back to 3 columns after the table?

Is there also a possibility to tell printwikipedia: "Please do only print article »Foobar«"? This would be good for testing purposes.

— Reply to this email directly or view it on GitHub.

CarstenG commented 8 years ago

Hi Michael,

nice to hear, that you like this idea with the columns.

No, I do not (yet) work with iText. I just dealing with printwikipedia itself to get it working on my laptop.

Ok, generating a specific volume is also a good thing for testing and debugging. How can I do this?