scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
232 stars 21 forks source link

Somehow make the spec searchable (e.g. by generating PDF version) #10218

Closed scabug closed 3 years ago

scabug commented 7 years ago

I can't imagine this is a new issue but I can't find an old one, so.

Please make the specification searchable (available as a single page as a quick fix?) or, better, figure out how to make a proper index. The language is complex enough that questions about its behavior come up a lot for users, and it's often quite hard to find the relevant section in the spec. For example, I had a question last night about import priority and found it in the introduction to identifiers, names and scopes in the 2.9 spec PDF and it happens to still be there (but not in the section on import statements, which I found via the TOC, which is where I looked first).

scabug commented 7 years ago

Imported From: https://issues.scala-lang.org/browse/SI-10218?orig=1 Reporter: Rob Norris (rnorris)

scabug commented 7 years ago

@SethTisue said (edited on Mar 6, 2017 5:01:40 PM UTC): issue and preceding discussion (on PDF generation specifically, not indexing more broadly) at https://github.com/scala/scala.github.com/issues/516

it would be wonderful if some volunteer tackled this.

bjornregnell commented 7 years ago

In an educational context, I think students learning Scala and also teachers designing Scala courses, in addition to a searchable document, would also greatly benefit from a pdf with the language spec readable off-line and printable from a paginated format.

atiqsayyed commented 7 years ago

Hi, Is this still open? If yes, can i pick it up?

lrytz commented 7 years ago

@atiqsayyed yes, you're more than welcome to work on this!

som-snytt commented 7 years ago

I volunteered on gitter today, to make sure there wouldn't be a permanent record.

atiqsayyed commented 7 years ago

@som-snytt sorry to have missed on this issue, can we discuss about it to make sure we understand what we have to do here?

som-snytt commented 7 years ago

I took a glance but won't have time until a three-day weekend that is not US Labor Day. Halloween is on a Tuesday this year.

bjornregnell commented 7 years ago

It's very good to be able to search but also nice to be able to print it and view it in a paginated form in a pdf-viewer, so for my Scala teaching efforts here at Lund University, a pdf version would be really valuable. It would be really cool if you both could join forces an achieve some progress on this issue, @som-snytt @atiqsayyed

jvican commented 7 years ago

I'm not sure, but I think there already exists a PDF version. At least I've used PDFs of previous Scala versions in the past.

One solution to this problem would be https://www.algolia.com/. It's free for open-source projects. It would be cool for the rest of the docs too, not only the spec.

But someone would need to step up to make it a reality. It wouldn't be difficult though, just:

  1. Make a request to get the search engine.
  2. Copy-paste some JS in the docs so that both the spec and the normal docs get different search boxes.
bjornregnell commented 7 years ago

I think a pdf version only exists for 2.11 which I think was written in latex, but now its markdown or something. A pdf-generation infrastructure for the language spec of 2.12 (and 2.13 and Dotty etc) would be really nice.

SethTisue commented 7 years ago

was written in latex, but now its markdown

correct. the change happened several years ago, in 2014, between 2.10 and 2.11

jvican commented 7 years ago

@SethTisue I find myself needing this. How can we make such a thing happen?

jvican commented 7 years ago

Seems something like this https://www.sitepoint.com/creating-pdfs-from-markdown-with-pandoc-and-latex/ could work. Is there someone out there that would like to contribute such a thing?

ritschwumm commented 7 years ago

i'd probably render markdown to html with some JS library and feed the thing to electron-pdf, athenapdf or maybe chrome (headless). the latter works really well in my experience. where can i find the markdown sources? i might give it a try...

SethTisue commented 7 years ago

@ritschwumm in the scala/scala repo under the spec directory

SethTisue commented 7 years ago

@jvican can’t think what to add besides what’s already in the comments here, or in the linked past discussion

som-snytt commented 7 years ago

@SethTisue Consider adding that next time folks update their will, they could include a small endowment or trust to ensure work on a ticket is funded. The resulting metric is the inverse bus factor, how many untimely deaths are required for features to progress.

jvican commented 7 years ago

@ritschwumm https://github.com/scala/scala/tree/2.12.x/spec.

Would be awesome if you give it a try.

ritschwumm commented 7 years ago

spent a few hours on it today - a single page renders quite nicely, but getting everything in a single document turned out to be quite difficult if you want to keep all links working.

SethTisue commented 6 years ago

@ritschwumm if your attempt is abandoned, perhaps you could link to a wip branch that someone else could pick up...?

nafg commented 6 years ago

I think just sticking this on it should work:

<form method="get" action="http://www.google.com/search">
  <input type="search"   name="q"  placeholder="Google site search">
  <input type="hidden" name="sitesearch" value="https://www.scala-lang.org/files/archive/spec/2.11/" />
  <input type="submit" value="Go!" />
</form>

Also, you can use Algolia, like Play's docs.

ritschwumm commented 6 years ago

@SethTisue sorry, i don't have a branch - i refuse to sign a CLA, so that wouldn't make much sense.

here's what i have so far:

#!/bin/bash

rm spec/all.md
rm build/spec/all.html
rm -f test.pdf

# TODO index needs layout toc
chapters='
01-lexical-syntax
02-identifiers-names-and-scopes
03-types
04-basic-declarations-and-definitions
05-classes-and-objects
06-expressions
07-implicits
08-pattern-matching
09-top-level-definitions
10-xml-expressions-and-patterns
11-annotations
12-the-scala-standard-library
13-syntax-summary
14-references
15-changelog
'

# prefix chapters with a special anchor
(
    #echo "---"
    #echo "title: Scala Language Specification"
    #echo "layout: default"
    #echo "---"
    #echo ""
    for i in $chapters; do
        echo >&2 "### $i"
        echo '<a name="CHAPTER-'"$i"'"></a>'
        cat "spec/$i.md" 
        #| tr '\n' '\0' | perl -pe 's/^---(.*?)---//' | tr '\0' '\n'
    done
) |
# remove target page name from links to anchors
perl -pe "s/\[([^\]]+)\]\(\d\d-[a-z-]+\.html(#[^)]+)\)/[\1](\2)/g"      |
# point links to chapters to the CHAPTER anchor
perl -pe "s/\[([^\]]+)\]\((\d\d-[a-z-]+).html\)/[\1](#CHAPTER-\2)/g"    |
cat >spec/all.md

# TODO add a chapter-anchor
#   \[  ([^\]]+)                \]
#   \(  (\d\d-[a-z-]+\.html)    \)

#[Unicode escape](01-lexical-syntax.html) or by an [escape sequence](#escape-sequences).
#<a name="pookie"></a>

bundle exec jekyll build -d build/spec/ -s spec/ --baseurl="."
docker run --security-opt seccomp:unconfined  --rm -v "$(pwd):/converted/" arachnysdocker/athenapdf athenapdf -D 1000 build/spec/all.html test.pdf
evince test.pdf
adriaanm commented 6 years ago

Since you don't want to sign a CLA, could you clarify under which license you post this code?

ritschwumm commented 6 years ago

haha, good question :) WTFPL, if you can work with that - or do you need something more formal?

adriaanm commented 6 years ago

@ritschwumm, thanks -- public domain (== WTFPL) is fine with me. Just looking to avoid any licensing issues for the project, which is ultimately what the CLA is about.

dsbos commented 6 years ago

Bjorn Regnell wrote:

It's very good to be able to search but also nice to be able to print it and view it in a paginated form in a pdf-viewer ...

I think it would also be useful if the snippets of BNF for the grammar were hyperlinked, e.g., in "x ::= y z", the "y" would be a link to the production defining y. (And maybe the "x" could be a link to a list of the things that refer to x, with each item in the list linked to the production containing the reference.)

For a hacked-together partial simulation of the forward linking, see https://dsbos.github.io/temp-scala-hyperlinked-spec/2016-11-13_2.12_output/09-top-level-definitions.html and sibling files.

Daniel

jvican commented 6 years ago

@ritschwumm I had to make some changes in your script to generate a valid pdf document, but that contribution is great, I wouldn't have been able to figure it out myself. Thank you.

I also managed to create a mobi file out of the all.html via KindleGen (https://www.amazon.com/gp/feature.html?docId=1000765211). Most links work and it's overall readable. The style could be improved, but I'm happy with the result.

mghildiy commented 6 years ago

If I understand correctly, objective here is to generate PDF for one of the scala website(containing scala spccifications).

jvican commented 6 years ago

I think it would be great if, as a first step, we get a whole html file (like the one made by @ritschwumm) that has all the chapters and which is readable. From there, we can easily convert to PDF and to ebook formats through athenapdf (or maybe pandoc too?) and kindlegen.

mghildiy commented 6 years ago

Is it something like this we need: https://github.com/showdownjs/showdown

sake92 commented 6 years ago

I agree with this:

i'd probably render markdown to html with some JS library and feed the thing to electron-pdf, athenapdf or maybe chrome (headless). the latter works really well in my experience. where can i find the markdown sources? i might give it a try...

There's already support for that in my hepek project. It uses headless Chrome via Selenium, waits for JS to load and snapshots its HTML (see example here). 😃
Layout depends just on HTML's print CSS.

I'll try on weekend to tackle this! Probably hardest issue will be to map markdown files to corresponding hepek abstractions..

ritschwumm commented 6 years ago

how about a slightly different approach: if i remember correctly, the main obstacle was the irregular link structure of the original files. maybe we can just make them more regular somehow?

apart from that i'm not convinced that regex search&replace is the way to go - manipulating meaningful data structures is so much easier... is there a simple way to have those - some parser, maybe?

adriaanm commented 6 years ago

I'm more than happy for someone to rework the markdown sources if that makes generating pdf/html/mobi... easier!

jvican commented 6 years ago

As I see it though, these are the two true challenges:

  1. Make mathjax notation render correctly (especially in PDF and ebook formats)
  2. Merging independent spec sections (multiple markdown files) into one consistent view of the spec (only one markdown file).

There's not a lot of value in changing the content of the markdown sources if these two problems are not tackled (and also I would favor the least possible diff to makes this possible :smile:). As soon as we have a unified markdown file with all the chapters, we can use pandoc to turn the spec into an ebook or PDF.

ritschwumm commented 6 years ago

@jvican how is mathjax problematic?

jvican commented 6 years ago

Maybe it wasn't mathjax but whatever is being used for the notation of the language. In the PDF I generated a while ago, the notation was poorly displayed and it rendered most of the snippets explaining Scala's grammar unreadable.

sake92 commented 6 years ago

As promised, here is the site and pdf.
Source code is here.

I mostly struggled with maths+code interactions but somehow managed to get it working.. 😄
Of course, there's lots more work to be done.

SethTisue commented 3 years ago

https://github.com/scala/scala/pull/7432 is merged! So we can now generate a PDF locally.

I'm not closing this ticket yet, though, because there is work left to do: I need to actually publish the PDF on our website. Soon!

SethTisue commented 3 years ago

whoa, we're live! https://scala-lang.org/files/archive/spec/2.13/spec.pdf

SethTisue commented 3 years ago

For those who like PDF versions of things, see also the discussion at https://github.com/lampepfl/dotty/pull/10767#issuecomment-744738850 about a PDF version of the Scala 3 Reference