Suggestion: adding additional information to metas hash table

mbutterick / pollen

book-publishing system [mirror of main repo at https://git.matthewbutterick.com/mbutterick/pollen]

https://git.matthewbutterick.com/mbutterick/pollen

MIT License

1.19k stars 64 forks source link

Suggestion: adding additional information to metas hash table #163

Closed jlorieau closed 6 years ago

jlorieau commented 6 years ago

I've run into the problem of annotating my tags and rendered pages with chapter numbers. I've achieved this using some coding in the template:

<!DOCTYPE html>
<html>
◊(local-require racket/string racket/list)
◊(define local-pagetree (map (λ(x) (string-replace (symbol->string x) ".poly.pm" ".html")) (current-pagetree)))
◊(define chapter-num (index-of local-pagetree (symbol->string here)))
...

It's a bit hacky.

However, it would be useful to have this information in the document's metas hash table so that other procedures can properly render this information in tags. For example, the second figure in chapter 2 would be annotated, "Fig 2.2."

I'm sending this note to ask whether there's an easy way to populate the metas hash of each source file, preferably without copy-and-pasting code at the top of each Pollen markup source file--preferably it could be populated in the pollen.rkt.

Furthermore, it would be helpful to include more information in the metas hash, like the position of a pagenode within the ptree and the page-node name with the relative path. I looked at the metas hash-table for a basic pollen source file, and it only contained an entry for 'here-path', which is the absolute path of the source file.

Other default entries in the metas hash could be, for example:

'#hasheq((page-num . 3) (here . "hello.poly.pm"))

Thanks!

Justin

mbutterick commented 6 years ago

Why is it hacky? I think that’s the right idea: when the pagetree is the source of truth about the ordering of the document, then one should consult the pagetree. Perhaps you could express what you’re after more concisely with parent and siblings. But that’s cosmetics.

However, it would be useful to have this information in the document's metas hash table so that other procedures can properly render this information in tags.

I see what you mean, though this approach will likely cause headaches, because it creates a second source of truth (in the metas table).

I would propose using parameters:

1) Define one or more parameters in pollen.rkt, called current-chapter (or current-page, etc).

2) In your template, set the parameter imperatively: (current-chapter chapter-num).

3) Then, any subsequent references to (current-chapter) in the page or related tag functions will be correct.

PS. I’ve not used pollen-count. But IIRC it offers a similar approach to managing countable tags.

jlorieau commented 6 years ago

Thanks for the helpful comments. I'll try the parameter approach.

jlorieau commented 6 years ago

I've tried your suggestion with parameters. The updated value of the current-chapter parameter is shown in the template, but if I try to access this parameter in the pollen source file or in tags, it only shows the original value---not the updated value. From my understanding of parameters, the template appears to be rendering in its own context, and an update to a parameter within a template is not preserved out of this context when the tags are rendered. Consequently, I'm not sure that this approach would work. It's of course likely that there's an implementation detail I've missed.

I've looked at pollen-count before, but it appears to only register counters within a given document.

Altogether, I still believe that the availability of meta information on the whole project and the relationship of one page to another would be useful. Perhaps a suitable solution would be to include a meta hash for the ptrees.

In any case, I'll leave this ticket closed and let you decide whether you think this would be worth pursuing. I'd workup a solution and submit a PR, but I'm still too novice in Racket.

Thanks and Best Regards,

Justin

mbutterick commented 6 years ago

Apologies — that’s what I get for running the code in my head rather than in DrRacket.

OK, this one I’ve actually tested, and I recall the pattern from a question that once arose on the mailing list.

Rather than propagate values into metas, what you want to do is pass metas as an argument to current-page so that it can use the value of here-path to calculate the position within the pagetree. In other words, the computation that you’re currently doing in template.html will be done in the Pollen source instead.

A working example:

;; index.ptree
#lang pollen
foo.html
bar.html

;; pollen.rkt
#lang racket
(require pollen/pagetree)
(provide (all-defined-out))

(define (current-page-fn metas)
  (define here (path->pagenode (hash-ref metas 'here-path)))
  (number->string (index-of (siblings here "index.ptree") here)))

(define-syntax (current-page-macro stx)
  (syntax-case stx ()
    [_ (with-syntax ([METAS (datum->syntax stx 'metas)])
         #'(current-page-fn METAS))]))

;; foo.html.pm
#lang pollen
value of current-page-fn on foo = ◊(current-page-fn metas)
value of current-page-macro on foo = ◊(current-page-macro)

;; bar.html.pm
#lang pollen
value of current-page-fn on bar = ◊(current-page-fn metas)
value of current-page-macro on bar = ◊(current-page-macro)

When you view foo.html.pm you’ll see: value of current-page-fn on foo = 0 value of current-page-macro on foo = 0

And when you view bar.html.pm: value of current-page-fn on bar = 1 value of current-page-macro on bar = 1

The only difference between current-page-fn and current-page-macro is that current-page-macro spares you having to explicitly type metas as an argument. (There’s nothing special about the names; you can rename one of them to current-page)

current-page-fn basically reimplements your template function, slightly more neatly. path->pagenode does the housekeeping of converting here-path into a more useful pagenode form. Then index-of with siblings does the rest.

current-page-macro just generates the code (current-page-fn metas) that you would’ve otherwise typed. The datum->syntax fandango is needed to make an identifier that refers to the metas at the macro-calling site rather than the macro-definition site (explanation here)

mbutterick commented 6 years ago

If you’re still thinking it would be nicer to have the metas pre-populated … it’s not that I oppose these ideas. But as a rule, I avoid introducing magic into Pollen to solve problems that Racket has already solved. (Related discussion on mailing list)

mbutterick commented 6 years ago

One more thing: once you set this up, you can also use current-page-fn or current-page-macro from within the template (not just the Pollen source) with the same results. To the example above, add this file:

;; template.html.p
<!DOCTYPE html>
<html>
value of current-page-fn in template: ◊(current-page-fn metas)
<p>
value of current-page-macro in template: ◊(current-page-macro)
<p>
◊(->html doc)

Now when you view bar.html.pm you’ll see:

value of current-page-fn in template: 1

value of current-page-macro in template: 1

value of current-page-fn on bar = 1 value of current-page-macro on bar = 1

jlorieau commented 6 years ago

Thank you for the response and helpful comments. I'll play around with it today, and I'll see if I can get this working.

A few initial comments:

Issue 1: Pagenodes and filename extensions

I use poly.pm files, and it appears that path->pagenode converts the extension from .poly.pm to .html. The above code works when I rename the pagenode symbol to include the .poly.pm extension.

Perhaps the pagenode name for a .poly.* extension file should strip the extension?

Issue 2: path of the default ptree

The above code trips up if the file to render is in a subdirectory (or when the 'index.ptree' is a different filename). Consequently, I switched this to (current-pagetree).

Altogether, my current-page-fn looks like this.

(define (current-page-fn metas)
  (define here (string->symbol
                (string-replace
                 (symbol->string
                  (path->pagenode (hash-ref metas 'here-path))) ".html" ".poly.pm")))
  (number->string (+ (index-of (siblings here (current-pagetree)) here) 1 )))

(If the code before wasn't hacky, surely this snippet would qualify)

Issue 3: access to `metas`

I have a series of tags in files included in my `pollen.rkt' file. However, these do not have access to the metas hash table. So, in my case, I'm trying to render a caption tag (ex: Fig 2.2.), but the (current-page-fn) doesn't have access to the metas hash.

I suspect there's an easy work-around, but I'll play with it some more. I'll also give you access to my repo, in case you want to take a look. I'd ideally like to make it publicly available, once it's in decent shape.

Finally, regarding your comment:

If you’re still thinking it would be nicer to have the metas pre-populated … it’s not that I oppose these ideas. But as a rule, I avoid introducing magic into Pollen to solve problems that Racket has already solved.

I see your point, but providing additional information in the metas hash would help writers and developers from re-inventing the wheel for stuff that may come up frequently (like chapter numbers) or implementing sub-optimal solutions. Would it be possible to populate the metas with items based on flags set in the setup? Alternatively, could an 'unstable' module be used to produce an extended metas?

mbutterick commented 6 years ago

I use poly.pm files, and it appears that path->pagenode converts the extension from .poly.pm to .html. … Perhaps the pagenode name for a .poly.* extension file should strip the extension?

Right, because pagenodes are, by convention, output filenames. (Why? Because then you can change a source file from pm to pp to pmd etc. without having to update the pagetree.) In this case, when path->pagenode converts the poly.pm source to an output filename, it relies on default-poly-targets, which defaults to html.

I agree that behavior isn’t a great fit for poly sources. OTOH, stripping the extension entirely seems like inconsistent default behavior. Maybe path->pagenode should have an optional keyword arg that determines the extension applied to pagenodes made from poly sources (which could be no extension).

I have a series of tags in files included in my `pollen.rkt' file. However, these do not have access to the metas hash table. So, in my case, I'm trying to render a caption tag (ex: Fig 2.2.)

The functional-programming answer to this would be: whatever a function needs to compute its result should be passed as an argument (not, say, maintained from afar as a global variable). So if your caption-tag function needs to calculate something based on metas, then metas should be an argument to that function. If you don’t like the notation this produces, you can always use a macro to sugar it up.

I’m not saying this is always an “easy work-around”, nor that one must adhere to the functional-programming idiom dogmatically. But it’s usually wise to exhaust these possibilities first, because once you start introducing things that work like global variables, new worlds of pain quietly open.

providing additional information in the metas hash would help writers and developers from re-inventing the wheel for stuff that may come up frequently (like chapter numbers)

Again, I’m not opposed to it. OTOH I don’t use this kind of serialized numbering in my own projects. So someone would have to explain “here’s the most enlightened idea for how it ought to work” — based on best practices elsewhere, most likely — and we could go from there. (This is more or less how poly got implemented, another feature I don’t personally use much.) Certainly I don’t want my ignorance to create an upper bound.

This tension between local-ness and global-ness is partly inherent in the Pollen project model, which encourages projects to be made of a set of small source files “flying in formation” as opposed to one giant source file.

I’d also recommend posting to the Pollen mailing list, as others may have come up with better solutions.

jlorieau commented 6 years ago

Thanks for the note and for the detailed comments. Your note on a functional approach makes sense too. I'll make sure to post my pollen questions to the mailing list.

I have just a couple more replies, that may fall into feature suggestions---so I'll post them here, briefly. I'd be happy to share implementations, if you think they would be helpful and would fall in line with your design philosophy for Pollen.

Right, because pagenodes are, by convention, output filenames. (Why? Because then you can change a source file from pm to pp to pmd etc. without having to update the pagetree.) In this case, when path->pagenode converts the poly.pm source to an output filename, it relies on default-poly-targets, which defaults to html

I think a simple solution might be to provide a pagenode-member and pagenode-index-of procedure in the pagetree module that can deal with differences in extensions to recognize the right page node. Alternatively, a pagenode-equal? procedure that can more readily match the correct pagenode would be helpful too.

This tension between local-ness and global-ness is partly inherent in the Pollen project model, which encourages projects to be made of a set of small source files “flying in formation” as opposed to one giant source file.

I see your point, and the source file metas is probably the wrong approach. I think a better approach might be to have a project level metas hash (project_metas) that is optionally used and populated with information from the page tree and from the relationship between pagenodes. This project meta hash could include information on the order of a pagenode in relation to the pagetree as well as other useful project-wide meta information. Some examples of project-wide meta information that would be useful--particularly for those who are used to writing texts in Latex:

a (hash) table of labels to allow references between pages. In Latex, you might set a reference with \label{ch:My first chapter} then later refer to this chapter in another chapter with \ref{ch:My first chapter}. It would be helpful to have labels pre-populated in a table before they are referenced.
a (hash) table of acronyms or keywords. These could be used to link to related sections on a topic, basically like some of the sidenotes in the Racket documentation.

mbutterick commented 6 years ago

pagenode-member, pagenode-index-of, pagenode-equal? — I’m open to those, though can you explain how they would differ from member, index-of, and equal?

project_metas — also open to that, but a new core abstraction is more expensive to implement. So it would need to meet a higher burden: namely, that there is a certain trove of data that does not naturally fit in metas or the pagetree or pollen.rkt (which today, have clear & separate roles)

What would be some examples?

“information on the order of a pagenode in relation to the pagetree” — but how is that different from what you can accomplish with a pagetree query?
a “table of acronyms or keywords” — how is that not possible in pollen.rkt? (FWIW, I have a cross-referenced glossary in Beautiful Racket that needed no special apparatus).
“table of labels” — same question.

jlorieau commented 6 years ago

pagenode-member, pagenode-index-of, pagenode-equal? — I’m open to those, though can you explain how they would differ from member, index-of, and equal?

The following code block shows what I meant. The pagenode-equal? probably wouldn't be a good idea to test against though, since a pagetree might include duplicates.

#lang racket
(require racket/list
         sugar/test)

(provide pagenode-index-of)

(define (pagetree->base-str pagetree)
  (map (λ(x) (cond
               [(symbol? x) (regexp-match #rx"[^.]*" (symbol->string x))]
               [(list? x) (pagetree->base-str x)]
               [else x])) pagetree))

(define (pagenode-index-of pagetree pagenode)
  (cond
    [(index-of pagetree pagenode) (index-of pagetree pagenode)]
    [else
     (let ([pagenode-str (regexp-match #rx"[^.]*" (symbol->string pagenode))]
           [pagetree-str (pagetree->base-str pagetree)])
       (if (equal? (length (filter (λ(x) (equal? x pagenode-str)) pagetree-str)) 1)
           (index-of pagetree-str pagenode-str)
           #f))]))

(module-test-external
 (require racket/list)
 (define pt1 '(pagetree-root
               book.poly.pm
               fundamentals_soln_nmr/inept/inept.poly.pm))
 (check-equal? (pagenode-index-of pt1 'book.poly.pm) 1
               "Test exact match")
 (check-equal? (pagenode-index-of pt1 'book) 1
               "Test match on base filename")
 (check-equal? (pagenode-index-of pt1 'book.html) 1
               "Test match on base filename with different extension")
 (check-equal? (pagenode-index-of pt1 'book2.html) #f
               "Test mismatch on base filename")
 (define pt2 '(pagetree-root
               book.poly.pm
               book.html
               fundamentals_soln_nmr/inept/inept.poly.pm))
 (check-equal? (pagenode-index-of pt2 'book.poly.pm) 1
               "Test exact match with a duplicate in pagetree")
 (check-equal? (pagenode-index-of pt2 'book) #f
               "Test match on base filename, but with duplicate")
 (check-equal? (pagenode-index-of pt2 'book.pdf) #f
               "Test match on base filename with different extension for pagetree with duplicate.")
 )

project_metas — also open to that, but a new core abstraction is more expensive to implement. So it would need to meet a higher burden: namely, that there is a certain trove of data that does not naturally fit in metas or the pagetree or pollen.rkt (which today, have clear & separate roles)

That makes sense. I'll take a look at your cross-referenced glossary and think about it some more.

Thanks!

Justin

mbutterick commented 6 years ago

A pagetree won’t compile with duplicate pagenode names. Your pt2 is a little mischievous because it mixes the input path book.poly.pm with its output path book.html. The pagetree compiler won’t forbid this. But it won’t work correctly in the project server.

(BTW I was just fiddling around with poly files, index.ptree, and the project server — seems like it permits any extension included in poly-targets, or poly itself. So in this case, I would recommend adopting the convention of book.poly, inept.poly, etc)

If you like the way your pagenode-index-of works, I wouldn’t talk you out of it. Pagetrees are not necessarily flat lists, however, so it’s unclear to me how it would behave in the general case. (This is in fact why I include the children and siblings etc. functions — so you can pull out a list of pagenodes and then use the usual list functions.)

jlorieau commented 6 years ago

Thanks for the reply.

A pagetree won’t compile with duplicate pagenode names. Your pt2 is a little mischievous because it mixes the input path book.poly.pm with its output path book.html. The pagetree compiler won’t forbid this. But it won’t work correctly in the project server.

Are there cases when there could be duplicate base filenames? (i.e. "book.pm" "book.pp") The duplicate checking is for cases when the base filename (and relative path) are the same for two or more entries. If you'd like to include this code, I could clean up the tests in a PR. I'd also rename the function to index-of-pagenode (It's no problem if you'd rather not include it.)

Pagetrees are not necessarily flat lists, however, so it’s unclear to me how it would behave in the general case.

I would keep the functionality on par with index-of, which also does not search recursively. Implementing a recursive search would be fairly straightforward, in which case it would return a list--but I would call this something else. Such a function would be useful for Racket too, come to think of it.

(index-of '(1 2 '(3 4 5) 3) 5) ; #f
(index-where '(1 2 '(3 4 5) 3) 5) ; '(2 2)

mbutterick commented 6 years ago

Are there cases when there could be duplicate base filenames? (i.e. "book.pm" "book.pp")

Pollen expects a one-to-one relationship between input and output files. If it finds book.pm and book.pp in one directory, it will ignore book.pp, because they both want to produce a file called book.

index-of-pagenode

Ah, but check out the optional third argument to index-of, which controls how items are compared. Your function could be rewritten like so:

#lang racket
(require racket/list
         sugar/test)

(provide pagenode-index-of)

(define (node-prefixes-eq? y z)
  (define (node-prefix x) (string->symbol (car (string-split (symbol->string x) "."))))
  (eq? (node-prefix y) (node-prefix z)))

(define (pagenode-index-of pagetree pagenode)
  (or (index-of pagetree pagenode)
      (match (indexes-of pagetree pagenode node-prefixes-eq?)
        [(list first-and-only-index) first-and-only-index]
        [else #f]))) 

(module-test-external
 (require racket/list)
 (define pt1 '(pagetree-root
               book.poly.pm
               fundamentals_soln_nmr/inept/inept.poly.pm))
 (check-equal? (pagenode-index-of pt1 'book.poly.pm) 1
               "Test exact match")
 (check-equal? (pagenode-index-of pt1 'book) 1
               "Test match on base filename")
 (check-equal? (pagenode-index-of pt1 'book.html) 1
               "Test match on base filename with different extension")
 (check-equal? (pagenode-index-of pt1 'book2.html) #f
               "Test mismatch on base filename")
 (define pt2 '(pagetree-root
               book.poly.pm
               book.html
               fundamentals_soln_nmr/inept/inept.poly.pm))
 (check-equal? (pagenode-index-of pt2 'book.poly.pm) 1
               "Test exact match with a duplicate in pagetree")
 (check-equal? (pagenode-index-of pt2 'book) #f
               "Test match on base filename, but with duplicate")
 (check-equal? (pagenode-index-of pt2 'book.pdf) #f
               "Test match on base filename with different extension for pagetree with duplicate.")
 )

(index-where '(1 2 '(3 4 5) 3) 5) ; '(2 2)

This notion of location would be useful if you needed to preserve the nested coordinates. But do you? For instance, if you want to know what comes before or after 5, you could just flatten the list and then use normal list operations. This is something of a functional-programming habit: since you always work on a copy of the input data, you can mangle it with impunity.

jlorieau commented 6 years ago

Thanks for the reply, and I apologize for the delay. The beginning of the semester can get pretty busy.

I tried your modified code and couldn't get it to work with nested lists of symbols. I think I have to modify it a bit to get it working, and I'll include tests with subtrees.

mbutterick commented 6 years ago

Rather than drilling down from the top using index-of, you could also recursively use the length of previous* (which is equivalent to getting the node’s index at the current level), and parent (to move up to the next level).

mbutterick commented 6 years ago

Actually, never mind that suggestion — it won’t work, because previous* goes all the way to the front.

mbutterick commented 6 years ago

Something like this perhaps:

#lang racket
(require pollen/pagetree rackunit)

(define (index-where node pagetree)
  (parameterize ([current-pagetree pagetree])
    (define (sibling-index node) (index-of (siblings node) node))
    (and (siblings node)
         (let loop ([node node][indexes (if (children node) '(0) null)])
           (if (parent node)
               (loop (parent node) (cons (add1 (sibling-index node)) indexes))
               (cons (sibling-index node) indexes))))))

(define pt '(pagetree-root a b (c d (e f g) h i) j k))
(check-false (index-where 'foo pt))
(check-equal? (index-where 'a pt) '(0))
(check-equal? (index-where 'b pt) '(1))
(check-equal? (index-where 'c pt) '(2 0))
(check-equal? (index-where 'd pt) '(2 1))
(check-equal? (index-where 'f pt) '(2 2 1))
(check-equal? (index-where 'k pt) '(4))

jlorieau commented 6 years ago

Thanks for the note and for the code. I've had some time to think about it (and recover from a flu). I believe the simplest and best approach is to implement a pagenode-equal? procedure, based on your previous suggestion, for the following reasons:

It can more flexibly be used with existing list procedures in place of the is-equal? optional argument.
It avoids creating more specialized procedures for specific use-cases.
Following your comments on pagenodes with duplicate base filenames (e.g. book.pp book.pm), the procedure does not need to detect duplicates in the base filename within a pagetree.
It easily works, as expected, with flatten for nested pagetrees. You have a nice implementation of index-where, but as you said before, keeping the nested structure of the index value is likely not needed in Racket. Also, it appears that an index-where procedure exists in racket/list.

In the following implementation, I've created a regular pagenode-equal? method as well as a pagenode-nested-equal? procedure that works with subtrees. The latter could be used for both purposes, but I wonder whether the behavior would be unexpected to some users. I could also implement the optional nested search as a keyword argument to pagenode-equal?. The drawback here, I believe, would be that users would have to set the flag by passing the procedure as a lambda function.

The one drawback to my approach is that it will prematurely match filenames and paths with spurious periods ('.') that are not used to distinguish the extension. I've personally never used these, but other users may---though I'd imagine this practice would break other software tools as well.

These procedures could be useful to other users of pagetrees, and I'd be happy to prepare a PR and documentation, if you like the idea of their inclusion.

#lang racket
(require racket/list
         sugar/test)

(provide pagenode-equal? pagenode-nested-equal?)

(define (pagenode-equal? x y)
  (or (equal? x y)
      (and (symbol? x) (symbol? y)
           (equal? (regexp-match #rx"[^.]*" (symbol->string x))
                   (regexp-match #rx"[^.]*" (symbol->string y))))))

(module-test-external
 (require racket/list)
 (define pt1 '(pagetree-root
               book.poly.pm
               fundamentals_soln_nmr/inept/inept.poly.pm))
 (check-equal? (index-of pt1 'book.poly.pm pagenode-equal?) 1
               "Test an exact match")
 (check-equal? (index-of pt1 'book pagenode-equal?) 1
               "Test a match on the base filename")
 (check-equal? (index-of pt1 'book.html pagenode-equal?) 1
               "Test a match on the base filename with a different extension")
 (check-equal? (index-of pt1 'book2.html pagenode-equal?) #f
               "Test a mismatch on the base filename")

; The following tests are no longer needed as a pagetree should not include
; pagenodes with duplicate base names.
; (define pt2 '(pagetree-root
;               book.poly.pm
;               book.html
;               fundamentals_soln_nmr/inept/inept.poly.pm))
; (check-equal? (index-of pt2 'book.poly.pm pagenode-equal?) 1
;               "Test an exact match with a duplicate in pagetree")
; (check-equal? (index-of pt2 'book pagenode-equal?) #f
;               "Test match on base filename, but with duplicate")
; (check-equal? (index-of pt2 'book.pdf pagenode-equal?) #f
;               "Test match on base filename with different extension for pagetree with duplicate.")
 )

(define (pagenode-nested-equal? x y)
  (or (equal? x y)
      (and (symbol? x) (symbol? y)
           (equal? (regexp-match #rx"[^.]*" (symbol->string x))
                   (regexp-match #rx"[^.]*" (symbol->string y))))
      (and (list? x)
           (ormap (λ(w) (pagenode-nested-equal? w y)) x))
      ))

(module-test-external
 (require racket/list)
 (define pt3 '(pagetree-root
               book.poly.pm
               (pagetree-root
                 chapter1.poly.pm
                 chapter2.poly.pm
                 (pagetree-root
                   chapter3.poly.pm))
               fundamentals_soln_nmr/inept/inept.poly.pm))
 (check-equal? (index-of pt3 'book.html pagenode-nested-equal?) 1
               "Test inexact match with nested pagetree")
 (check-equal? (index-of pt3 'chapter1.html pagenode-nested-equal?) 2
               "Test inexact match in nested pagetree (level 1)")
 (check-equal? (index-of pt3 'chapter3.html pagenode-nested-equal?) 2
               "Test inexact match in nested pagetree (level 2)")
 (check-equal? (index-of pt3 'chapter4.html pagenode-nested-equal?) #f
               "Test non match in nested pagetree")
 )

mbutterick commented 6 years ago

You’ve persuaded me that there is room for improvement around two issues:

1) Querying pagetrees with poly sources. (Related: whether pagetrees should permit different pagenodes to have the same prefix.)

2) Deriving numbering from pagetree positions.

I’m not yet sold on the approach, however. For one thing, pagenode-nested-equal? is an equivalence predicate that’s not symmetric as to x and y, which is surprising. For another, I still think flat-list operations like index-of are an awkward fit with the nested structure of the pagetree.

But I think you’re on the right track. I also think you’ll end up with a better version of this idea :wink:

jlorieau commented 6 years ago

Thanks, I’ll think about it some more. I have a few initial thoughts, but these may be more complicated than worthwhile.

Querying pagetrees with poly sources. (Related: whether pagetrees should permit different pagenodes to have the same prefix.)

I believe the problem arises because pagetrees and pagenodes may comprise source paths, target paths, or possibly both. From my understanding of Pollen, this is by design, since a user can specify a source file or a target file, and Pollen will take care of the conversion.

From your previous comments, Pollen expects a one-to-many relationship between a source file and one or more target files. So the file book.poly.pm could generate book.html or book.tex, but the file book.html should not be generated from book.poly.pm and book.pmd. This makes sense since a generated target file should only be created in one way—you shouldn’t have to wonder whether book.html was created from book.poly.pm or book.pmd.

If you agree with the above comments, then a pagetree should not permit different pagenodes to have the same prefix because source and target paths are only different in their suffix. The pagetree also shouldn't contain pagenodes for different target types (e.g. .html and .tex)

AFAICT, there are a few solutions to this problem, and I think the first is the best.

Flexible matching functions in the pagetree module. The in-pagetree? and possibly a pagenode-equal? function will match a pagenode with a source or target filename symbol with the expectation that the pagenode prefix is unique. So, if 'book.html is in a pagetree, then the pagenode 'book.pm or 'book.html should return #t for in-pagetree? and pagenode-equal?.
Translator functions to generate pagetrees and pagenodes with uniform extensions or no extensions at all (i.e. just prefixes). The existing current-pagetree and get-pagetree could have a kwarg flag to return a pagetree with the extensions of the current-poly-target. The pagetree module could also include simple translator functions to convert pagenode suffixes to a given type. Finally, the pagenode of the target file would be populated in the metas hash. Currently, the here-path only points to the source file, and it includes an absolute path. A pagenode with the relative path (to the project directory) and the target extension would more easily match to the pagetree.
Requiring that pagetrees are written as a list of source files or a list of (unique) prefixes. I believe this option would be the most clean, but I also think it would violate some of Pollen’s design philosophy mentioned above, and it would be backwards incompatible since users (and the documentation) don’t use this approach.

Aside: you may consider changing the .poly.pm extension to something like .plm or .pml so that the extension can be stripped from a path or pagenode more easily.

Deriving numbering from pagetree positions.

I was thinking about this some more, and keeping the nested structure of the numbering is probably desired. So a function like the index-where you presented would be useful—but it should probably be renamed.

When I had started using Pollen, I thought that the nested pagetree structure was used for storing chapters in subdirectories. This is not the case, as you already know, since you can still use a flat pagetree with files in subdirectories. Instead, I’d imaging nested pagetrees to be used for separating parts, subsections and sections of a book. For these, giving the user the option of a hierarchical numbering system makes sense. So a pagenode-count would return a list of numbers so that a user could use it for numbering—e.g. section 4.5. It would be useful to have this function include a flag for a running count so that if pagenodes are split between subtrees, the counter is optionally reset.

I’m not yet sold on the approach, however. For one thing, pagenode-nested-equal? is an equivalence predicate that’s not symmetric as to x and y, which is surprising. For another, I still think flat-list operations like index-of are an awkward fit with the nested structure of the pagetree.

The asymmetry is easy to fix since the two variables can be swapped in the function. If you like this function, I could work it up.

mbutterick commented 6 years ago

I believe the problem arises because pagetrees and pagenodes may comprise source paths, target paths, or possibly both.

I personally don’t think it makes sense for a pagetree to contain source paths. Still, I wouldn’t elevate that view from convention to requirement. I prefer to forbid as little as possible.

Flexible matching functions

Yes, it could make sense to let pagenodes match on something other than eq?. Maybe an extra argument, à la sort or index-of. Your pagenodes-equal? seems like an example of a procedure one could pass as that argument.

mbutterick commented 6 years ago

Seems like the pagenode equality procedure should become a setup value. Suppose it’s called pagenode=? and it defaults to eq?. Then it could be used both when compiling the pagetree — that is, if any nodes are pagenode=? to each other, the pagetree is rejected — and when making queries. This has the side effect of guaranteeing that queries can’t have ambiguous results, which seems like the right policy.

jlorieau commented 6 years ago

I personally don’t think it makes sense for a pagetree to contain source paths. Still, I wouldn’t elevate that view from convention to requirement. I prefer to forbid as little as possible.

That makes sense. However, a pagetree with source file pagenodes (or potentially different target types) can be loaded with get-pagetree. A user for poly sources will likely use the source files in the index.ptree so that the same index can be used for multiple target types.

Maybe an extra argument, à la sort or index-of. Your pagenodes-equal?

I agree that this would be the best approach, and I can submit a PR. Before doing so, I’d like to decide how to deal with nested pagetrees. I like the idea of a pagenode=? whose behavior depends on a flag in setup. I don’t think pagenode=? should search recursively (at least by default), but it could be helpful to have a recursive search version too. (If pagenode=? searched recursively, then a pagenode in a root pagetree and a a subtree couldn’t be used twice. For example, a user couldn’t use an 'intro.html pagenode in multiple subtrees if the default pagenode=? were to search recursively.)

Also, the here-path variable in metas needs to be worked up to use it as a pagenode for querying against the pagetree. It would be helpful to include a pagenode entry in the metas hash table that can reliably be used to query against pagetrees generated from current-pagetree or get-pagetree.

Finally, I also think it would be helpful to include a setup option or an alternative define-tag-function that passes the metas by default. In this case, it's likely possible to even just change define-tag-function to include the extra option without breaking backwards compatibility. If you like the idea for apagenode in the metas, or this idea, I could start a new issue for each.

mbutterick commented 6 years ago

Though I’m curious about how this issue can be improved, I have reservations about both your framing of the problem and your proposed implementation (as noted above). I’ll review a PR, but I can’t guarantee I’ll use it.

On Jan 18, 2018, at 10:23 AM, Justin Lorieau notifications@github.com wrote:

I personally don’t think it makes sense for a pagetree to contain source paths. Still, I wouldn’t elevate that view from convention to requirement. I prefer to forbid as little as possible.

That makes sense. However, a pagetree with source file pagenodes (or potentially different target types) can be loaded with get-pagetree. A user for poly sources will likely use the source files in the index.ptree so that the same index can be used for multiple target types.

Maybe an extra argument, à la sort or index-of. Your pagenodes-equal?

I agree that this would be the best approach, and I can submit a PR. Before doing so, I’d like to decide how to deal with nested pagetrees. I like the idea of a pagenode=? whose behavior depends on a flag in setup. I don’t think pagenode=? should search recursively (at least by default), but it could be helpful to have a recursive search version too. (If pagenode=? searched recursively, then a pagenode in a root pagetree and a a subtree couldn’t be used twice. For example, a user couldn’t use an 'intro.html pagenode in multiple subtrees if the default pagenode=? were to search recursively.)

Also, the here-path variable in metas needs to be worked up to use it as a pagenode for querying against the pagetree. It would be helpful to include a pagenode entry in the metas hash table that can reliably be used to query against pagetrees generated from current-pagetree or get-pagetree.

Finally, I also think it would be helpful to include a setup option or an alternative define-tag-function that passes the metas by default. In this case, it's likely possible to even just change define-tag-function to include the extra option without breaking backwards compatibility. If you like the idea for apagenode in the metas, or this idea, I could start a new issue for each.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.

mbutterick commented 6 years ago

(Also, I’m going to close this issue because it has now spawned sub-issues that are better pursued separately)

mbutterick / pollen

Suggestion: adding additional information to metas hash table #163

Issue 1: Pagenodes and filename extensions

Issue 2: path of the default ptree

Issue 3: access to metas

Issue 3: access to `metas`