Implement Rich Indexing Specification

dbolack-ab commented 7 months ago

Your idea:

The rich indexing specification is designed for building an arbitrary number of Book Index style lists consisting of alphabetized Topics and subtopics.

The system involved is two-part. In the markdown parsing, index tokens are processed and either converted to HREF links or consumed. The snippet parses the brew source, collates duplicated entries, and inserts an alphabetized index list for each list.

The markdown takes one of two forms - each is a single line.

#Index2 List|Of|Topics - One or more Topics, separated by a | character that will be included in the user-specified index Index2
#Index2 List|Of|Topics // Subtopic - One or more Topics, separated by a | character with a subtopic that will be included in the user-specified index Index2

While there may be multiple topics, there may only be one subtopic in a link.

Parsing Details: In case 1, An HREF Anchor link in the form <a href="#p{pageNumber}_{sluggified_subtopic}" data-topic="{topic}" data-index="{index}"></a> and consumed from the markup. In case 2, An HREF Anchor link in the form <a href="#p{pageNumber}_{sluggified_subtopic}" data-topic="{topic}" data-subtopic="{subtopic}" data-index="{index}"></a> and consumed from the markup.

Snippet details: This assumes that the marked run created proper anchors. The snippet runs its own parsing against the brew text. This parsing will collect all the indexing markdown tags and the associated page number. Duplicate Topic and Subtopic entries will be collated so that references to the same index text on multiple pages will yield an ordered list of pages in relation to the indexed text. Topic entries will be alphabetized. Subtopic entries will be alphabetized. Each list will be rendered with the Lists name as an H2 header followed by the ordered contents of the index followed by a \page.

An active PR for an earlier version of this exists as https://github.com/naturalcrit/homebrewery/pull/3113 and will be updated to the new markdown shortly. Styling will be derived from the existing Indexing snippet styles.

calculuschild commented 7 months ago

Your #1 and #3 are normal headers, due to the space after the #.

Is there a reason that tags with subtopics leave an anchor behind but without a subtopic don't? Shouldn't they all leave anchors behind?

dbolack-ab commented 7 months ago

Your #1 and #3 are normal headers, due to the space after the #.

Thank you, Corrected.

Is there a reason that tags with subtopics leave an anchor behind but without a subtopic don't? Shouldn't they all leave anchors behind?

It was a gut decision, I see no problem with having both leave anchors.

ericscheid commented 7 months ago

What happens if there are two instances of adding an index marker of the same topic/subtopic on the same page?

Currently, both would receive an anchor of the same id .. any link to that id value would still work, but targetting only the first instance, and though it is invalid html to have multiple instances of the same id in the document it doesn't really cause any problems.

(BTW, I don't anticipate any duplicate collisions with other minted ids from e.g. headings, as these indexing ids incorporate an underscore _ and headings have their auto-minted ids normalised to plain text lowercase with dashes for punctuation.)

ericscheid commented 7 months ago

In case 3 and 4, an HREF Anchor link in the form <a href="#p{pageNumber}_{sluggified_subtopic}" subjectheading="{topic}" entry="{subtopic}"></a> and consumed from the markup.

The attribute names of subjectheading and entry are not valid html. These should be data-topic and data-subtopic instead.

dbolack-ab commented 7 months ago

In case 3 and 4, an HREF Anchor link in the form <a href="#p{pageNumber}_{sluggified_subtopic}" subjectheading="{topic}" entry="{subtopic}"></a> and consumed from the markup.

The attribute names of subjectheading and entry are not valid html. These should be data-topic and data-subtopic instead.

Invald? no. Non-standard? Sure. The standard sucks.

I don't think that existed as a standard the last time I did attribute injection. Time for new tennis balls on the walker...

Updating the original accordingly

calculuschild commented 7 months ago

What happens if there are two instances of adding an index marker of the same topic/subtopic on the same page?

We could probably find a way to only emit the anchor for the first one and just ignore the duplicate.

dbolack-ab commented 7 months ago

What happens if there are two instances of adding an index marker of the same topic/subtopic on the same page?

We could probably find a way to only emit the anchor for the first one and just ignore the duplicate.

Alternately, We can track and instance mark or we can consider it user error. Dunno what is entirely best.

ericscheid commented 7 months ago

The attribute names of subjectheading and entry are not valid html. These should be data-topic and data-subtopic instead.

Invalid? no. Non-standard? Sure. The standard sucks.

It doesn't passes w3c's validation testing

The data-* attribute thing has been around a long time, specifically designed for adding arbitrary attributes to elements.

calculuschild commented 7 months ago

Some more comments:

1) Do we actually need the HTML attributes? If the link URL is unique, is that sufficient for this without cluttering the markup?

2) Is there a strong reason we don't want multiple levels of subtopics? The D&D PHB only goes one level deep but it seems like this syntax could support multiple topics and multiple subtopics:

Topic/Subtopic|Topic/Subtopic/SubSubtopic|Topic

3) I'm not sure this was fully addressed: What do we want to do as far as a "default" label? On the one hand, just putting index into the snippet might work, and requiring it to be exactly 1 word removes ambiguity. On the other hand not needing any label at all looks nicer.

4) This was suggested on Gitter:

...when the snippet generates the index it extracts the label from the line and outputs as a header abover the index. This assumes the label is one word only...

I like the idea of generating a header based on the label. Should we support multi-word labels?

5) Addressing 2 and 3, I might suggest placing the label after the topics, delimited by another #:

#Topic|Topic|Topic             // Default
#Topic|Topic|Topic #Appendix A // Supports multiple words if needed
#Topic/Subtopic    #Appendix A

This would allow for no label as well as multi-word labels without ambiguity.

6) Do we need any kind of "see X" notation?

Warhorse links back to creature statistics, but creature statistics doesn't link back to Warhorse.

Or is this something we can already infer logically using our current syntax somehow?

dbolack-ab commented 7 months ago

Do we actually need the HTML attributes? If the link URL is unique, is that sufficient for this without cluttering the markup?

I wasn't sure at the time but thought it would be better to have them present, then decide they weren't needed than the other way around. If we are forced for (currently unguessable ) reasons to parse the <pages> to build the Index, the info is there.

Is there a strong reason we don't want multiple levels of subtopics? The D&D PHB only goes one level deep but it seems like this syntax could support multiple topics and multiple subtopics:

I mostly constructed it that way because I could not recall ever seeing more than two depths in an RPG index. If we did that, I think we'd want to remove the "This subtopic can belong to multiple topics" code and force 1:1:1... for Topics and subtopic children.

) I'm not sure this was fully addressed: What do we want to do as far as a "default" label? On the one hand, just putting index into the snippet might work, and requiring it to be exactly 1 word removes ambiguity. On the other hand not needing any label at all looks nicer.

This was suggested on Gitter: I like the idea of generating a header based on the label. Should we support multi-word labels?

I don't see a problem with it.

Addressing 2 and 3, I might suggest placing the label after the topics, delimited by another #:
Topic|Topic|Topic // Default

Topic|Topic|Topic #Appendix A // Supports multiple words if needed

Topic/Subtopic #Appendix A

OOh. I like this.

Do we need any kind of "see X" notation? Or is this something we can already infer logically using our current syntax somehow?

I don't think we can infer it reasonably, but it is something we could see about adding as another delimited column.

#Topic/Subtopic@See Also 1@SeeAlso2#Index

Maybe?

ericscheid commented 7 months ago

index names

Alternative syntax might be #index name: topic/subtopic.

When you have lots of indexing entries it's nicer to have the consistent string be at the front for readability #Index name: Topic/subtopic #Index name: Another Topic #Index name: Another Topic/and a subtopic #Index name: Another Topic/and second subtopic #Index name: Another Topic/and third subtopic

cross references

With the "see also", do note that there are usually two kinds: "see also topic", and "see topic". This is indexing convention. (I know people that index books as their profession).

dbolack-ab commented 7 months ago

A couple of random thoughts.

I rather prefer // to / for the separator - it mirrors the :: for glossary and allows the use of / without escaping in a topic or subtopic
What about using a variable for a multiple-word index name? We could do something like
```
$[index_default](Appendix A: Indexes)
#_default Topic//Subtopic
```
If we have a multiple-word Index label, I think we should allow the use of the colon in that label so perhaps separate with #? #Appendix A: Index#Topic//Subtopic
We might need to stew a little more on the See/See Also scenario. Should this decision be a block or can we push it to a different PR?

ericscheid commented 7 months ago

The See/See Also implementation can wait for a future PR .. it would be prudent to reserve a syntax though,

dbolack-ab commented 7 months ago

For the reservation, I suggest appending |See|See Also where either are optional.

examples: See reference only #Index#Topic//Subtopic|See

See Also Only #Index#Topic//Subtopic||See Also

Both @#Index#Topic//Subtopic|See|See Also

ericscheid commented 7 months ago

An index entry wouldn't have both See and See also.

If an index entry does have a preferred term (i.e. See ...), then it is that referred index entry that would have the related concepts See also cross references.

dbolack-ab commented 7 months ago

The documentation I say, doing quick "what are these concepts" to refresh my brain of the specific meaning did both, though I think that may have been a case of poor examples.

This is clearer, from https://ugapress.org/resources/for-authors/indexing-guidelines/:

CROSS REFERENCES
The cross reference is a space saver and serves to prevent duplication. However, it is not worthwhile to use a cross reference if the length of the cross reference takes more space than listing (repeating) the page numbers. Here duplication is permissible.

In making a cross reference, be sure the exact words of the referenced heading are used. Also make sure there is such an entry. Follow the capitalization style you have used for index entries (“See also education” if common‐noun entries are lowercase, “See also Education” if all entries are capitalized).
See follows an entry with no locators—it simply refers the reader to another part of the index. See also follows an entry with locators; it refers the reader to additional information in another entry. See also under refers to the reader to a subentry under certain circumstances. See The Chicago Manual of Style for more information.
See, See also, and See also under should be underlined unless preceding an underlined (italic) cross reference, in which case use roman (“See also education” but “See also Souls of Black Folk, The”).
Separate cross references with semicolons.

With that in mind, I suggest the following

See: postfix |cross reference entry See Also: postfix ||cross reference entry See Also Under: postfix |||cross reference entry

This limits the prohibited characters from the Subtopic and postfix to a single character which I feel is better than three separators.

Examples.

See: #Index#Topic//Subtopic|See

See Also: #Index#Topic//Subtopic||See Also

See Also Under: #Index#Topic//Subtopic|||See Also Under

calculuschild commented 7 months ago

Edit: Cleaned up for clarity (hopefully?) I like the idea of using | to separate the main topic and any cross-reference ("see") topics. Based on the indexing guidelines shared by @dbolack-ab, I actually think we can logically extract which see/see also/see under/see also under variation to use based on the complete accumulated index contents (PHB uses a 4th variation not mentioned above, "See under") . The syntax could be as follows:

#Topic | Cross-reference, where an anchor for "Topic" will be created, and "Cross-reference" will point back to "Topic" in the index.

Index entries are atomic as possible, which means each index anchor will need a distinct Markdown element. No more creating anchors for a multiple topics in the same line.

Then, determining see/see also/see under/see under also no longer has any ambiguity or guessing involved, because any cross-reference will always be paired with a real, existing index entry. It is as simple as following these examples:

#Index# Other Topic C
...
#Index# Main Topic | Other Topic A
#Index# Main Topic/Subtopic1 | Other Topic B
#Index# Main Topic/Subtopic2
...
#Index# Main Topic | Other Topic C
#Index# Subtopic2

Page anchors are created for Other Topic C, Main Topic (on two different pages), Main Topic/Subtopic1, Main Topic/Subtopic2 and Subtopic2, but not Other Topic A or Other Topic B, as they are only ever listed as cross-references.

See

Other Topic A has no anchor dropped anywhere in the document and it is listed as a reference to a top-level topic, the index entry for Other Topic A will be use "see".

Also, Other Topic B has no anchor dropped anywhere in the document and it is listed as a reference to a subtopic, the index entry for Other Topic B will use "see" with a Topic : Subtopic notation.

Main Topic, 12, 15 Subtopic1, 12 Subtopic2, 12 ... Other Topic A, See Main Topic Other Topic B, See Main Topic : Subtopic1

See under

Main Topic/Subtopic1 has an anchor dropped in the document, and will use "see under" to signify it is a subtopic of the main topic.

Main Topic, 12, 15 Subtopic1, 12 Subtopic2, 12 ... Subtopic1, see under Main Topic

See under also

Subtopic2 has an anchor dropped in the document, but is also a subtopic of Main Topic. It will use "see also under" to signify it is its own entry in addition to being a subtopic of the main topic.

Main Topic, 12, 15 Subtopic1, 12 Subtopic2, 12 ... Subtopic2, 15 see also under Main Topic

Then users don't need to know the ins and outs of index formatting to select the correct option of | or |||, as the correct wording will be generated automatically based on what other index anchors exist.

calculuschild commented 7 months ago

I rather prefer // to / for the separator - it mirrors the :: for glossary and allows the use of / without escaping in a topic or subtopic

I still prefer a single / because that looks cleaner to me as the well-established "subfolder"-type symbology that many are used to (and is what Obsidian uses for tags. You have probably noticed I am looking to them a lot as the driving force behind popularizing a lot of the "non-standard" Markdown.) I might be convinced otherwise, but I would rather have a cleaner syntax in general and occasionally have to escape a slash:

#Index # Weapons\/Tools/Swords

In any case, I don't think mirroring the :: syntax should be a driving factor on this decision. If anything I think we should mirror the single | divider (after all, what if someone wants a | in their topic name?).

What about using a variable for a multiple-word index name? We could do something like
$[index_default](Appendix A: Indexes)
#_default Topic//Subtopic

I guess that should technically work but I would leave that up to the user if they want to add that step. I wouldn't make it the default.

If we have a multiple-word Index label, I think we should allow the use of the colon in that label so perhaps separate with #? #Appendix A: Index#Topic//Subtopic

I agree. I think # is a better divider here. Limit the number of symbols a user has to memorize or workaround.

We might need to stew a little more on the See/See Also scenario. Should this decision be a block or can we push it to a different PR?

With the guidelines you posted, I think we can logic it out automatically.

calculuschild commented 7 months ago

When you have lots of indexing entries it's nicer to have the consistent string be at the front for readability

I agree having things line up does help legibility. Though I will point out we can use spacing to line things up too kind of like a table, in which case a post-label I think ends up looking more legible overall, since it allows a default "blank" label to align with all others:

Post-label

#Topic/subtopic                                  // <- Subtopic included in the default index, due to no label
#Topic/subtopic                    # Appendix A  // <- Same subtopic will also be included in Appendix A
#Another Topic                     # Appendix A 
#Another Topic/and a subtopic      # Appendix A
#Another Topic/and second subtopic # Appendix A
#Another Topic/and third subtopic  # Appendix A
#Another Topic                     # Appendix BCD // <- This topic will be included in both Appendix A and Appendix BCD

vs Pre-label

#Topic/subtopic                                // <- Subtopic included in the default index, due to no label
#Appendix A # Topic/subtopic                   // <- Same subtopic will also be included in Appendix A
#Appendix A # Another Topic
#Appendix A # Another Topic/and a subtopic
#Appendix A # Another Topic/and second subtopic
#Appendix A # Another Topic/and third subtopic
#Appendix BCD # Another Topic                  // <- This topic will be included in both Appendix A and Appendix BCD

vs Pre-label aligned

#Topic/subtopic                             // <- Can't align, since adding spaces would make this a Header.
#Appendix A      # Topic/subtopic           // <- Same subtopic will also be included in Appendix A
#Appendix A      # Another Topic
#Appendix A      # Another Topic/and a subtopic
#Appendix A      # Another Topic/and second subtopic
#Appendix A      # Another Topic/and third subtopic
#Appendix BCD    # Another Topic            // <- This topic will be included in both Appendix A and Appendix BCD

In these examples, the post-label looks more legible to me if you are working with multiple labels nearby (even trying to line up the last one as neatly at possible). But this may just be a personal style preference. Maybe we can do a poll on the gitter chat.

ericscheid commented 7 months ago

Hmm .. whether a cross reference shows as a See or a See also shouldn't be determined by whether the referenced topic has subtopics. They should be determined by the author, as suits their intended purposes. One is a cross reference that says "this concept is more commonly known as X, go look there", and the other says "this concept is interesting in it's own right, but also go look at this other related concept for more context".

Is it possible that an author might use a See also when they mean See (and vice versa)? Yes. That's on them.

calculuschild commented 7 months ago

Hmm .. whether a cross reference shows as a See or a See also shouldn't be determined by whether the referenced topic has subtopics.

This is correct, and indeed, consistent with the rules I listed above. Subtopics influence only whether "see under" or "see also under" applies.

They should be determined by the author, as suits their intended purposes.

This is not correct. Each has a specific defined use case according to established rules.

One is a cross reference that says "this concept is more commonly known as X, go look there", and the other says "this concept is interesting in it's own right, but also go look at this other related concept.

This is correct conceptually, but there are actual rules that define these more precisely:

"See: Follows an entry with no locators of its own—it simply refers the reader to another part of the index."

"See also: Follows an entry with locators; it refers the reader to additional information in another entry."

Is it possible that an author might use a See also when they mean See (and vice versa)? Yes. That's on them.

But it shouldn't need to be on them if we just generate the correct version for them, which we can do.

We have a calculator and a clear set of rules. If the user gives us 1 + 2 we should give them 3. If they really want to do their own thing and change it to 5 later, they can, but most people would probably want our calculator to give them the right answer to start with.

ericscheid commented 7 months ago

They should be determined by the author, as suits their intended purposes.

This is not correct. Each has a specific defined use case according to established rules.

Yes. I'm referring to the conventional use of the types of references as being the "intended purpose", to which the author should conform. I'm not suggesting the author determines the purpose.

This is correct conceptually, but there are actual rules that define these more precisely:

See: The current reference does not have a page number of its own, so look at this other reference.

See also: The current reference does have a page number of its own, but look at this related reference also.

No. These rules define (precisely) how the two uses are effected. The causality is purpose-of-reference → index entries.

A See entry that references an index entry that does not have a page number is an error. (The princess is in another castle. Which castle? Oh, it doesn't exist.) One we could likely fix by minting an index entry for the referred item, using the page the syntax was found on. (Technically fraught, as it would also be possible to write up a whole bunch of See references as a list of synonyms independent of the actual content. Indexes were traditionally constructed separate from the actual content.)

And then there's also the edge case of

Recursion,
- See Recursion

(no comment)

calculuschild commented 7 months ago

A See entry that references an index entry that does not have a page number is an error.

Right. And with our syntax as I defined above, this should not be possible. There is no way to create a "see" entry that points to something that doesn't exist. The correct conventions are baked in to the syntax.

purpose-of-reference → index entries.

Thus, I would amend this to:

Syntax → purpose-of-reference → index entries.

The whole point being that if we follow the rules I outlined above, our syntax will naturally enforce correct purpose of reference, and hence correct indices. We can take advantage of that to save users the work of figuring that out themselves.

dbolack-ab commented 7 months ago

In these examples, the post-label looks more legible to me if you are working with multiple labels nearby (even trying to line up the last one as neatly at possible). But this may just be a personal style preference. Maybe we can do a poll on the gitter chat.

This is a no-value added requirement.

dbolack-ab commented 7 months ago

This will drop an anchor for Main Topic and another anchor for Main Topic/Subtopic at that location (but not Other Topic A or Other Topic B).

correct so far.

If Other Topic A has no anchor dropped anywhere in the document and it is listed as a reference to a top-level topic, the index entry for Other Topic A will be use "see":

This is absolutely not how the code works at present and I see no value in this addition. Code should, at most, warn the user about unfound/incorrect/incomplete references and make zero guesses about layout intent.

This will drop an anchor for Other Topic A and another anchor for Other Topic B, then later also drops anchors for Main Topic and Main Topic/Subtopic.

Gods no. Create multiple entries if that is needed.

Then users don't need to know the ins and outs of index formatting to select the correct option of | or |||, as the correct wording will be generated automatically based on what other index markers exist.

No. This is not a wizard.

calculuschild commented 7 months ago

This is a no-value added requirement.

It is not a requirement at all. I'm not sure where you got that impression. Sometimes adding spaces to make things line up just looks nicer.

calculuschild commented 7 months ago

This is absolutely not how the code works at present

Hence my proposal above to add this new functionality. I am suggesting a change to how the code works.

Code should, at most, warn the user about unfound/incorrect/incomplete references and make zero guesses about layout intent.

Fortunately, our our syntax doesn't suffer from any of these issues, so there is no need to warn/guess anything. Using #Topic|Cross Reference, a cross-reference will always be paired with a real, existing index entry. It is impossible to do otherwise. Thus, we can determine with certainty what the layout intent is.

Reusing my analogy from before: We have a clear set of rules and provide our users with a calculator. If a user gives us 1 + 2, should we not respond with 3? Seems a lot more valuable than responding "I don't know what answer you wanted."

This will drop an anchor for Other Topic A and another anchor for Other Topic B, then later also drops anchors for Main Topic and Main Topic/Subtopic.

Gods no. Create multiple entries if that is needed.

This is exactly what it is doing. It is using multiple separate entries to do this. I suspect you may have misread something here.

No. This is not a wizard.

Are we not creating an auto-generator for indices? We have all the tools at our disposal to make this work automatically. Why not?

ericscheid commented 7 months ago

Fortunately, our our syntax doesn't suffer from any of these issues. A cross-reference will always be paired with a real, existing index entry. It is impossible to do otherwise. Thus, we can determine with certainty what the layout intent is.

If I understand the preceding proposal correctly, this is because the instance of the indexing syntax that calls for a See reference will also insert an index entry for the page the See indexing syntax occurs on if there isn't a paged index entry yet.

That's an unsafe assumption.

I could, for example, get to the end of writing my text, navigate to the page the index will be generated onto, and add a bunch of "X, see Y" syntax instances as an attempt to ensure I've covered off the likely synonyms. If the text regarding topic "Y" (which had it's own direct indexing syntax) was later deleted .. then I certainly don't want to have "X, see Y" point to an index entry for "X" with a page number of the index page itself.

The alternative workflow is that I have to go hunt down an actual page containing X to add the "see X" indexing syntax. Given that there might be many pages that are indexed for "X" then the "see X" might end up on any of them. I certainly don't want to putting them on each of them just to be sure (and to guard against later text deletion).

dbolack-ab commented 7 months ago

Fortunately, our our syntax doesn't suffer from any of these issues, so there is no need to warn/guess anything. Using #Topic|Cross Reference, a cross-reference will always be paired with a real, existing index entry. It is impossible to do otherwise. Thus, we can determine with certainty what the layout intent is.

Okay. I read what you are suggesting backwards.

I don't care for this idea, definitions should go in one direction.

This is exactly what it is doing. It is using multiple separate entries to do this. I suspect you may have misread something here.

Yes, I did.

Are we not creating an auto-generator for indices? We have all the tools at our disposal to make this work automatically. Why not?

Index Markdown entries should be as atomic as possible., IMO, and that means you'll have a markdown for the distinct entry crossreffed and the entry that crossrefs

calculuschild commented 7 months ago

If I understand the preceding proposal correctly, this is because the instance of the indexing syntax that calls for a See reference will also insert an index entry for the page the See indexing syntax occurs on if there isn't a paged index entry yet.

Ok, This might be where the disconnect is coming from. Let me try to clarify how I'm imagining this syntax working a little more.

When you write an index marker #aaa/bbb|ccc, you supply first: the topic you are anchoring (subtopic if necessary), and second: any synonyms or references to that topic (which will not be anchored at this point). Every marker you create will drop an anchor for the topic (or subtopic) at that position on the page, thus allowing a given index entry to point to multiple pages by placing more markers as needed. Any references are not given a page anchor at that point; they would need their own index marker if you want to give them a page anchor.

Example:

// ---------------------------- Page 1

#Weapons|Equipment
This is a paragraph introducing weapons.

#Weapons/swords
This is a paragraph about swords.

// -----------------------------Page 2

#Weapons/daggers|Dual-wielding
This is a paragraph about daggers. Daggers are the only weapon you can dual-wield.

#Weapons/spears
This is a paragraph about spears

#Armor|Equipment|Defense
This is a paragraph about armor.

// ------------------------------Page 3

#Armor/shields
This is a paragraph about shields

// ------------------------------Page 4

#Equipment
This is a big list of all equipment you can find:
- Boxes
- Bags
- Ribbons
- Tags
- Weapons
- Armor

#Encumbrance | Carry capacity | Weapons | Armor
This is a paragraph about how to carry your equipment

#Repairs and Upgrades|Weapons
This is a paragraph about how to repair your weapons

// ------------------------------Page 5

#Horses
#Encumbrance | Saddle Bags
This is a paragraph about how horses can carry things

This would generate:

Armor, 2, see also Encumbrance
    shields, 3
Carry Capacity, see Encumbrance
Daggers, see under Weapons
Defense, see Armor
Dual-wielding, see Weapons : Daggers
Equipment, 4
    see also Armor; Weapons
Horses, 5
Saddle bags, see Encumbrance
Shields, see under Armor
Spears, see under Weapons
Swords, see under Weapons
Weapons, 1
    see also Encumbrance; Repairs and Upgrades
    daggers, 2
    spears, 2
    swords, 1

I could, for example, get to the end of writing my text, navigate to the page the index will be generated onto, and add a bunch of "X, see Y" syntax instances as an attempt to ensure I've covered off the likely synonyms.

Yeah, you wouldn't do that with this approach, since that would place a bunch of index entries pointing to the index page itself. You would instead add synonyms as relevant to a given page. For example, on page 4 above, I decided "Carry Capacity" would be a term someone might search for to find this paragraph about encumbrance. On page 5, I decided that "Saddle Bags" would be a likely term someone would search to reach that paragraph. So both "Carry Capacity" and "Saddle Bags" will now point to "see Encumbrance".

The alternative workflow is that I have to go hunt down an actual page containing X to add the "see X" indexing syntax. Given that there might be many pages that are indexed for "X" then the "see X" might end up on any of them. I certainly don't want to putting them on each of them just to be sure (and to guard against later text deletion).

Luckily you wouldn't need to put the see X syntax on each index marker. Placing it on any one is sufficient to have it added as a synonym to the chosen topic. By providing synonyms near the paragraph they are likely searching for, the mental burden of "did I add this synonym" is handled right at the source: just visit the relevant paragraph where that synonym should apply and add it.

I don't expect this to convince you, but at a minimum I would hope this puts us on the same page about what I am proposing, so we can continue the discussion from the same understanding. "I think we should buy a house." "I don't think we should buy a horse for these reasons." "No, a house."

I also totally re-wrote my earlier proposal to try to clear this up. https://github.com/naturalcrit/homebrewery/issues/3369#issuecomment-2023128570

calculuschild commented 7 months ago

I don't care for this idea, definitions should go in one direction.

My proposed syntax is going only in one direction. Topics on left, synonyms (or references) on the right. The reference only points to the Topic and not vice-versa. What am I missing?

Index Markdown entries should be as atomic as possible., IMO, and that means you'll have a markdown for the distinct entry crossreffed and the entry that crossrefs

This is already part of my proposal, unless I am again totally missing something.

#Topic | cross-reference

A distinct page anchor for only one Topic, plus an optional cross-reference that should point back to it. If you also want a page anchor for cross-reference, it needs its own distinct markdown entry:

#cross-reference

I also totally re-wrote my earlier proposal to try to clear some of this up. https://github.com/naturalcrit/homebrewery/issues/3369#issuecomment-2023128570

ericscheid commented 7 months ago

I could, for example, get to the end of writing my text, navigate to the page the index will be generated onto, and add a bunch of "X, see Y" syntax instances as an attempt to ensure I've covered off the likely synonyms.

Yeah, you wouldn't do that with this approach, since that would place a bunch of index entries pointing to the index page itself.

That is however a very possible likely workflow, one which this tool actively interferes with. When I'm looking at the index itself, either to review or even to actually use it ... that is when I would realise that the term I have for the concept I'm looking for does not appear in the index, and that a See cross-reference is called for. And I'd rather just add it in as an synonym cross reference right there, vs navigating to one of many possible pages the preffered term is indexed on and inserting there. Especially if there's a risk that that section of text might get revised/deleted (removing the See cross-reference, despite the concept still being indexed on multiple other pages).

Luckily you wouldn't need to put the see X syntax on each index marker. Placing it on any one is sufficient to have it added as a synonym to the chosen topic.

Only sufficient if there is no risk of that particular section of the text (with attached cross references) not getting removed.

By providing synonyms at the site where they are likely to be searched for, the mental burden of "did I add this synonym" is handled right at the source: just visit the relevant paragraph where that synonym should apply and add it.

Again, there might be multiple instances of the preferred term indexed across multiple pages .. and we only need one instance of the see cross-reference .. so which page has the cross reference index marker? And when I'm editing text that does have that cross-reference index marker, and I know that the preferred term is indexed on multple other pages .. is this instance a duplicate I can safely delete (e.g. if the text is revised and is no longer referencing the preferred term concept)?

The site where I'm likely to search for particular terms is the index. That is what an index is for. Concept → term → page number | redirect to preferred term. Inserting the synonym redirections onto individual pages is fraught in either fragility (one synonym instance might get deleted despite referent concept also appear on alternative pages), or redundant duplication.

The tools should support the workflow. The workflow shouldn't have to bend to suit the tool.

calculuschild commented 7 months ago

I can see I am still not being understood on several points, so I apologize that I am not able to communicate my idea in an effective way. Rather than continue trying to clarify things, I will step back from this one for a now and focus on some of the other PRs for a while until some other proposal comes through.

dbolack-ab commented 7 months ago

This is what I'm working with.

#[IndexMarker][LocationMarker]

Where:

IndexMarker is

[IndexNameMarker][Topic]\/[Subtopic]

IndexNameMarker is

A string ending with a :
A variable name prefixed with a _ followed by a : - e.g. _index:
Empty, indicating the default index.

Topic is a string. Subtopic is a string.

LocationMarker is

empty, indicating the index entry should link to this location and use the current page number in the index.
A SeeCrossReference, indicating a "see X" index crossreference
A SeeAlsoCrossReference, indicating a "see also X" index crossreference
A SeeUnderCrossReference, indicating a "see under X" index crossreference
A SeeAlsoUnderCrossReference, indicating a "see alsu under X" index crossreference

SeeCrossReference is

|[IndexMarker] where IndexMarker is identical to the Indexmarker being referenced.

SeeAlsoCrossReference is

||[IndexMarker] where IndexMarker is identical to the Indexmarker being referenced.

SeeUnderCrossReference is

|+[IndexMarker] where IndexMarker is identical to the Indexmarker being referenced.

SeeAlsoUnderCrossReference is

||+[IndexMarker] where IndexMarker is identical to the Indexmarker being referenced.

If a Cross Reference's IndexAddress references the same index, the index will be omitted from the formatted address. If the IndexAddress of a SeeUnderCrossReference or SeeAlsoUnderCrossReference points to a subtopic that does not exist on the topic or the topic has less than 2 subtopics, it will be redirected to the topic and reported to the user as an inline comment .

Styling/ordering of the Crossreferences is a next stage topic and if it can't be handled purely by CSS then we want to consider having more than one Index generator function that can generate more than one style. IMO, use the Chicago manual first.

Reasoning:

An index entry should be the minimal as possible for default use cases. #Topic should be the simplest entry and a typically subtopic would be #Topic/Subtopic.

ericscheid commented 7 months ago

Looks good.

I'm assuming white space (tabs and spaces) are permitted either side of the separator symbols (but not within, e.g. |+). No white space following the initial # marker, of course.

Also, the IndexNameMarker needs to note that if present then it needs to start with a non-space character, and that any literal : (colon) character in the IndexNameMarker needs to be escaped (as \:). And be at least one character (i.e. #: Topic is not valid).

Similarly, the [Topic] and [Subtopic] strings need to have any / (solidus) or | (pipe) characters escaped (as \/ and \|, respectively.)

Lastly, the wording for SeeUnder and SeeAlsoUnder references should be clear that they both can only point to topics, and not subtopics. And then that if a) the topic does not exist, or b) the topic has fewer than 2 subtopics, then the error will be reported (via a ).

ericscheid commented 7 months ago

I still think doubling of the pipe character || is a mistake. I get you want to reduce the number of symbols the author needs to remember and understand, but here you are asking them to distinguish meaning between | and ||, vs distinguishing between | and @. The same burden, but with even less signal.

calculuschild commented 7 months ago

I still think doubling of the pipe character || is a mistake. I get you want to reduce the number of symbols the author needs to remember and understand, but here you are asking them to distinguish meaning between | and ||, vs distinguishing between | and @. The same burden, but with even less signal.

We sure we don't want to just deduce which one to use (see or see also) by looking if the cross reference has a page anchor of its own? Then we only need one | for both cases.

ericscheid commented 7 months ago

Whether a "see X" vs a "see also X" cross-reference is called for is entirely an editorial decision. It can not be deduced.

Elixirs 
  ... see Potions
Oils, magical ... page 18
  ... see also Potions
Potions ... page 17
  ... see also Oils, magical

At best you could try to short-circuit forwarding chains (e.g. Philtres → Elixiers → Potions).

calculuschild commented 7 months ago

Whether a "see X" vs a "see also X" cross-reference is called for is entirely an editorial decision. It can not be deduced.
Elixirs 
  ... see Potions
Oils, magical ... page 18
  ... see also Potions
Potions ... page 17
  ... see also Oils, magical
At best you could try to short-circuit forwarding chains (e.g. Philtres → Elixiers → Potions).

These examples can all be deduced accurately. Elixirs has no page number of its own, so it uses "see". Oils and Potions do have page numbers where further information can be found, so they use "see also". Not an editorial decision (never was, still not sure where this idea comes from). It follows a defined rule that every style guide I can find is pretty explicit on. I dont understand the eagerness to ignore the rules when we can use them to avoid errors so easily.

Using "see also" with Elixirs would be an error. Using "see" with potions or oils would be an error as well.

We have the means to avoid these errors, with a bonus of simplifying the syntax. Is that such a bad thing? I see only upsides.

ericscheid commented 7 months ago

The base problem is that form should follow function, not function following form.

Remember too though that the mapping of X to Y in a "X .. see Y" referral is not always a perfect mapping of concepts — sometimes Y is a broader term, sometimes Y is a narrower term, and sometimes there are multiple meanings of the term text (e.g. "pole").

In addition to the mapping of these synonyms and homonyms, an author might wish to relate a given term to related content. That related content might not exist or be indexed at the same concept level, and so the relationship to the broader/narrower concept might not fit within the homonym referrals .. and so one might end up with this:

referring-term ..
  .. see homonym-term1
  .. see homonym-term2
  .. see also broader-category
  .. see also narrower-category

Another example:

Let's say you have a term "Coding Languages" in your index. Now, "Coding Languages" could be a broad term that encompasses various specific languages. In this case, you might have:

"see Programming Languages" (indicating that for more detailed information on programming languages, one should refer to the entry for "Programming Languages").
"also see Technology Ethics" (indicating that readers interested in Coding Languages could also be interested in the broader field of Technology Ethics and perhaps look there for additional related information).

calculuschild commented 7 months ago

The homonym example is interesting; technically two (or more) different entries that happen to appear in the same place in the index, hence the mixing of see and see also which is usually an error. From what I can find, this is generally resolved by differentiating the name with a specifier in parentheses, rather than combining into the same entry:

Pole (nationality) see Polish Pole (equipment) 12,13 see also Sticks see also under Reach weapons

not

Pole 12,13 see Polish see also Sticks see also under Reach weapons

I.e., each entry should use only one type of see, see under, or some combo of multiple see also/see also under. Mixing "see" and "see also" is a sign of an error.

Let's say you have a term "Coding Languages" in your index. Now, "Coding Languages" could be a broad term that encompasses various specific languages.

Even in this case, mixing both see and see also would be an error. Either "Coding Languages" has its own pages or not. Using "see" requires that the current entry has no page number of its own. "See also" requires that the current entry has page numbers of its own.

"also see Technology Ethics" (indicating that readers interested in Coding Languages could also be interested in the broader field of Technology Ethics and perhaps look there for additional related information).

We must be clear that, again, "see" and "see also" are not interchangeable labels that can be optionally chosen at the editor's choice. "See" and "see also" have quite literally the same purpose (pointing to another entry), with the only distinction being whether the current index topic has its own pages. Full stop. There is no other hidden meaning separating the two. As much as an editor might want to ascribe separate personal meanings to these two labels and so use them in some other way, doing so would be an error.

calculuschild commented 7 months ago

If we want to go out of our way to provide a more complex syntax for the sole purpose of allowing users to create erroneous indices (this logic is baffling to me), I might suggest the following:

#Topic | See
#Topic |+ See also
#Topic | See under/   <- `/` to indicate that we are pointing "under" the whole topic, but not any specific subtopic
#Topic |+ See also under/

Though this makes me consider proposing a second snippet for the index generator, so users have the option:

Snippet 1 - Manual Index: Uses the above syntax to allow for any personal deviations from style guide standards
Snippet 2 - Auto Index: Uses the simpler syntax to guarantee an error-free index

dbolack-ab commented 6 months ago

Snippet 1 - Manual Index: Uses the above syntax to allow for any personal deviations from style guide standards

I think there's some unintentionally loaded language here.

What I am trying to provide in my PR is a system that builds indexes based on the user syntax and ascribes to no particular writing style guide rules. I think this makes for a better tool long term and we can solve for some things down the road.

As first-pass compromise, I've put in dropping references to non-existent targets and collating subtopic targets to the parent topic when the parent has two or fewer subtopics. What I'd like to do in later iterations is have a couple of three "preconfigs" and then some prompts for collations and error handling ( Drop errors? Report errors? etc ) coupled with an output log that informs on broken references and rules-based cross-reference redirects ( and whatever else we want along the way )

Snippet 2 - Auto Index: Uses the simpler syntax to guarantee an error-free index

What precisely do you envision here?

naturalcrit / homebrewery