gutenbergtools / autocat3

CherryPy App that serves dynamic content for Project Gutenberg
GNU General Public License v3.0
2 stars 6 forks source link

Deprecate non-generated content when generated content is available #106

Closed gbnewby closed 9 months ago

gbnewby commented 10 months ago

This should happen after ebookconverter issue #38 is completed and in production.

Now that we have confidence in the generated files, I think it's safe to add a little logic to the landing pages:

When there is a generated format, don't list the "as submitted" format.

This mostly applies to plain text and HTML and, eventually, when PDF is generated. Occasionally we have RST, RDF and other input formats that result in HTML or PDF - those input formats shouldn't be listed on the download page.

Basically, if there is cache/epub/.xxx then 1/2/3/.../.xxx should not be listed.

After this change, "More files..." will be the only place where the as-submitted files will be.

gbnewby commented 9 months ago

I'm checking back on this. Is this something that is queued up for implementation?

Completion of this change to landing pages is pre-requisite for changes to pushed files, where we will start pushing without the header + footer and only include the needed sentinel line.

eshellman commented 9 months ago

The ebookconverter changes should be ready next week; changes to autocat3 shouldn't take more than a few hours, so allow a day or two after EBC.

gbnewby commented 9 months ago

That's great.

I have further thoughts on the original proposal and have had a discussion with Roger about it.

I am now thinking we don't need "More files..." at all. Everything we want to encourage people to download will be on the landing page, with links to /cache/epub as well as /1/2/3 as appropriate/needed. The "More files..." is (a) not needed, (b) potentially confusing, and (c) for new titles will be exposing headerless/footerless files in /1/2/3 that we disfavor for general reading - better with the generated files.

eshellman commented 9 months ago

I use "More files..." a lot for debugging, to be honest, and there may be some oddball "books" where relevant files are not surfaced; I'll check. Two other options: (1) add "nofollow" to the link (2) move the link below the fold, after the bibdata, and use css to make it look like unlinked text.

gbnewby commented 9 months ago

Yes, actually I use the More files too for investigating issues.

Your ideas for making it less obvious sound good to me. The main concern is that we will be pushing headerless/footerless files to 1/2/3 and we don't want those to be confusing for people who stumble across them.

We might consider using robots.txt to keep crawlers out of 1/2/3 entirely, in favor of cache/epub. That would decrease the likelihood of the public landing there.

My suggestion is to get this mocked up in autotcat3 and put on dev.gutenberg.org so we can see how it looks, iterate, and perhaps solicit some input from PGers.

On Mon, Sep 18, 2023 at 7:09 AM Eric Hellman @.***> wrote:

I use "More files..." a lot for debugging, to be honest, and there may be some oddball "books" where relevant files are not surfaced; I'll check. Two other options: (1) add "nofollow" to the link (2) move the link below the fold, after the bibdata, and use css to make it look like unlinked text.

— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/autocat3/issues/106#issuecomment-1723504706, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQRDLW67DZ2FF6UA4WMRO3X3BIZZANCNFSM6AAAAAA3ZK4CNE . You are receiving this because you authored the thread.Message ID: @.***>

eshellman commented 9 months ago

Sounds like a plan.

On Sep 18, 2023, at 12:23 PM, Greg Newby @.***> wrote:

We might consider using robots.txt to keep crawlers out of 1/2/3 entirely, in favor of cache/epub. That would decrease the likelihood of the public landing there.

My suggestion is to get this mocked up in autotcat3 and put on dev.gutenberg.org http://dev.gutenberg.org/ so we can see how it looks, iterate, and perhaps solicit some input from PGers.

eshellman commented 9 months ago

addressed in 087c6601e432eabeb9b27cb78834c873ed6c8fcd and 1329543eb163912ed110d38edefce5fa695f0540 at this time I prefer not to deal with the small number of generated pdf files.