All hyphens in anchor ID make debugging and automated operations more difficult

tradej commented 5 years ago

I know #66 was just merged, but I think I've heard enough negative feedback (and I'm not a fan of the change myself) that I feel compelled to open this issue.

Separating the anchor ID by a single hyphen makes debugging and potential automated changes in the future more difficult by losing the bit of information where the anchor id ends and the context begins. This was brought up in #66 several times, but was ignored and the PR was merged nonetheless.

Consider:

[id="booting-rhel-installation-dvd"]

versus

[id="booting-rhel_installation-dvd"]

In the second example, I can easily see that booting rhel is the name of the module, whereas in the first example, it could be booting-rhel-installation. This is an inconvenience when debugging build errors, but it's downright catastrophic when you need to do any automated transformations using a script.

The original proposal to use double quotes was shunned because they kept being replaced with an m-dash, which was solved by mandating double quotes instead of single quotes in anchor IDs (see #91), so it should not be a problem any more. I therefore propose to explicitly allow anchor-id_{context}, but recommend anchor-id--{context}, and for the sake of future changes, recommend refraining from anchor-id-{context}.

WDYT?

tradej commented 5 years ago

cc @VladimirSlavik @jherrman

jherrman commented 5 years ago

I am fully in favour of keeping the anchor_context structure instead of anchor-context.

Not having to hit the shift key every once in a while is admittedly a (somewhat) positive change, but the drawback are quite simply not worth it:

the modular IDs would be more difficult to read and troubleshoot
the overhead of changing the significant number of IDs we already have in place in the docs would be potentially enormous

sterobin commented 5 years ago

As the one who suggested anchor-id_{context} as the recommended format when this was first adopted and documented, I agree with @tradej and brought up the same point in previous issue threads around this. The reason I proposed the underscore to begin with was to distinguish the anchor ID from the context. Same issue in cross references: one cannot tell what file it is when the ID and context run together with all hyphens. The BA team has intentionally not adopted the all-hyphen prescription previously closed because the losses outweigh the gains, in our opinion. If hyphens are a must for tooling reasons, then the double hyphen Sheldon recommends is a decent compromise.

Meanwhile, we really should think of adopting an auto-generation method for new module files that gives it the anchor ID format automatically, to remove the room for error and inconsistency. I believe Marek Cermak or Suchanek came up with something once (can't find their tags).

tradej commented 5 years ago

@sterobin it was Marek Suchanek.

bhardesty commented 5 years ago

I can understand the scripting/tooling difficulty with using all hyphens in the anchor IDs. However, using underscores presents an even bigger issue for SEO. Google treats hyphens as word separators, but it treats underscores as word joiners.

So, taking Sheldon's example id ([id="booting-rhel_installation-dvd"]), once the doc is published, Google would see something like this:

"booting, rhelinstallation, dvd"

The root issue here is that we need UUIDs, but we don't have a CMS that can provide such functionality (component content management systems, for an asset like a module/topic, will typically auto-generate IDs that are unique across the content set).

This issue is further aggravated by the fact that we're making our IDs pull double-duty: they're functioning both as UUIDs and keywords.

To me, the ideal state would be to assign an alphanumeric UUID to each assembly and module to serve as the anchor ID. Then we would use the appropriate Asciidoctor syntax to add keywords separate from the anchor ID (perhaps using Asciidoctor metadata).

Without a solution that addresses this root issue, I'm not sure it's worth solving this issue: it's just going to be trading one issue for a different issue.

tradej commented 5 years ago

@bhardesty That's a very valid point. The UUID is, AFAIK, in play for the long-time solution, but to painlessly implement and transition to it, I believe the current solution (all single hyphens) is the worst of all that were talked about.

ncbaratta commented 5 years ago

The thing is, a decision was made and now we're rehashing it. If we keep going back to things that have been decided then we're never going to be able to get to 100%. As it is I know we already have had to go back and change small things like this at least twice in our docs instead of working on modularizing new content.

rkratky commented 5 years ago

@bhardesty To your point, would it help if we used something like name_-_context -- I know it's horribly ugly and unwieldy, but I would also like to have a way to automate things when dealing with modular docs -- after all, automation is one of our chief goal's in all areas. The ability to efficiently parse the source code is a huge advantage.

@ncbaratta Yes, it's a pain, but I believe we need try to find an optimal way to deal with things like this. And the problems that the 'only dashes' variant brings are greater than potential adjustments. If you go back to the discussion in #66, you'll see that it ended kind of abruptly by merging the PR without taking into consideration Vladimir's and others' points. Aaron just went ahead and merged it while calling the matter "a trivial issue" :)

VladimirSlavik commented 5 years ago

We can use double hyphen (minus) -- !

Previously I did propose a double change - use double quotes and two hyphens - and verified it works. That was shot down because the discussion was too confused. Also the double hyphen did not work for most participants precisely because IDs were enclosed in single quotes, and that made asciidoctor substitute that into an emdash that then choked publican. However, to reiterate the opening sentence, with double quotes, a double hyphen just works there.

EDIT: To my best knowledge, conversion from single quotes to double (already accepted) does not break anything, so it can be deployed any time in any project. That then prepares the ground for the context separator change.

rkratky commented 5 years ago

@VladimirSlavik +1 Sounds like the best solution.

parsebility - check
no underscores - check

VladimirSlavik commented 5 years ago

All credit to @vikram-redhat for that idea, actually - i just investigated all the subsequent details.

vikram-redhat commented 5 years ago

Chiming in here with some updates.

I have got frustrated with not being able to automate parts of our workflow because there is no logical way to separate the anchors into their constituent parts. This is a major flaw and a daily cause of headache for my team.

I went ahead and did tests on the double dash solution (which I understand hasn't yet been accepted). Unfortunately, the double dash solution also doesn't work, not for the reason you would expect. A correctly labelled id in a module will preserve the double dashes:

id=["my-module--{context}"]

BUT, how do you reference this anchor id from another assembly?

One example would be:

xref:link-to-assembly-containing-module.adoc#my-module--{context}

Unfortunately, whichever way you try to reference the link to the module, AsciiDoctor will turn the double dashes to em-dash.

To bypass that, you have to then escape the double dash in the referencing link, like so:

xref:link-to-assembly-containing-module.adoc#my-module\--{context}

which brings me to a breaking point with this whole experiment. This seems like a solution to a problem caused by a solution to another problem, we shouldn't have. :)

So, here is how I see it:

We need to be able to separate parts of an anchor, just for the sake of debugging issues with modules (it is important as anyone who has dealt with more than 50 modules will tell you).
We can't use double dashes, for the reasons I have put above.
This leaves us with only the underscore as a possible solution (which I have tested and confirmed actually works).

So what are the objections to using an underscore?

Google SEO: OK, let's be honest. By the time Google is going to process and use the anchor id in ranking, it has already gone through your site, your sub-directory, your file name (which likely already contains many of your keywords from your anchor id). The effect of a word joiner here is minimal. Besides, the days of Google using anchor ids for ranking are long gone.
It is hard to mix and match with dashes and underscores for writers. This has some credence. I don't write a lot, but I can see how it can be a minor inconvenience when you are creating 1000s of anchor ids and linking to them.

However, this minor inconvenience pales in comparison to the major inconvenience of not being able to debug issues in documentation caused by an inability to identify and track modules in assemblies. Harder still is the fact that automation is going to be nearly impossible to implement with tooling if we have no way to separate the anchor texts.

Hope this long post makes sense. I am hoping that enough people can chime in on this in the next few days that we can make a decision.

tradej commented 5 years ago

Thanks, @vikram-redhat, for this comprehensive test and write-up. It makes quite a lot of sense, so I'm voting to revert to the underscore notation that was there in the beginning. What about others?

adahms commented 5 years ago

Hi everyone,

As much as I agree that we should avoid re-opening old decisions once made, I also agree with others in the thread that there is sufficient cause to make sure this distinction is doing exactly what we need it to.

@tradej - During the PoCs open hours sessions where this was first brought up, you mentioned it might be something that can go away once we have the system in place, and that we can find a more elegant solution there. Does that still sound true?

And to all - we'll be coming up on a milestone soon where we can put forward requirements for the system, a big piece of which could be more elegant handling of section IDs so that we can do away with this entirely. Among other things. If we can discuss what that would look like and come to the understanding that contexts might go away entirely at some stage, does that alleviate any concerns?

In regards to the direction itself, I agree that we should go with either '-' or '_' instead of coming up with a more complicated solution. I also feel we should favor function over form - what is going to solve the most problems for us, knowing that either one cannot solve all the problems - in which case I agree with Vikram and others that an underscore seems to be the smoothest way to go.

Either way, we can easily create a script that can automatically update all instances in a suite to use the official syntax, so I'm not too worried about the workload for this specific item.

Last but not least, this has been open for a while now, so we should work to a conclusion in the next few days so that we can flag this as done and move on to more pressing topics. I'll put out the call on the ccs-mod-docs mailing list for comment, but if anyone wants to take sides, we should start doing that now.

If we need to, let's set up a call, talk it through in the same place, and get this done.

rkratky commented 5 years ago

+1 to Vikram's latest proposal (use the underscore as the separator).

theashiot commented 5 years ago

+1 for underscore.

cliostechscribe commented 5 years ago

I don't see that an elegant solution to this problem exists. None of the options is ideal.

Single hyphen does not differentiate effectively between the components of the achor ID.
Underscore creates an SEO problem.
Underscore-hyphen-underscore is simply unwieldy to type.
Double hyphen may be converted to an em-dash during publishing.

The option that seems to separate the ID from the context while minimizing the demand on authoring is the double hyphen.

We then have to follow up with training and review to ensure that the correct quotation marks are used.

mjahoda commented 5 years ago

+1 to anchor-id_{context}

ssorj commented 5 years ago

Can you use a slash or a colon to separate the context? They both worked in my testing with asciidoctor, and they are both legal characters in URL fragment IDs.

anchor-id/{context} anchor-id:{context}

https://stackoverflow.com/questions/2849756/list-of-valid-characters-for-the-fragment-identifier-in-an-url

Mixing underscores and hyphens is confusing (and best avoided). If you do end up going with that, you should reverse their usage: "-" to separate the context from the rest and "_" to separate the words of the main ID part. Hyphen is a stronger separator.

anchor_id-context

samccann commented 5 years ago

I agree with @vikram-redhat guidelines in his latest post, in terms of what is critical to the decision. I'm inclined to also agree with the underscore proposal. I don't know enough about @ssorj list of alternatives to pipe in on whether they are better or not for solving the given problem, but if the anchor-id is eventually going to be (or should be acting like) a UUID, I'm inclined to stay with anchor-id vs anchor_id, simply because UUIDs use dashes not underscores.

(and if all that sounds like waffling, it is - my doc-work is not in asciidoc yet, so I'm mostly here to listen and learn and adopt what I can from the final decision for the docs I work on).

sgcarpenter commented 5 years ago

Requiring an underscore in the middle of a bunch of hyphens (both of which are added with the same key) seems like asking for trouble. If you do go with it, please also write a script to detect anchor IDs with either no underscores or more than 1 underscore. Then commit to chasing down and fixing all the errors. The more I contemplate that, the less I like the underscore proposal.

rkratky commented 5 years ago

@ssorj, a quick test with ccutil shows that a slash breaks the build. And while a colon works, it is turned into a dash by ccutil. So, it "disappears" (between the rest of the dashes).

Granted, the separator is mainly needed for working with source code. But the fact that the separator would not be recognizable in compiled docs doesn't sound intuitive to me.

rkratky commented 5 years ago

@sgcarpenter, as you say, it would be possible to detect whether an ID contains the right separator, or not. Also, once this is finalized, I'd think that the majority of writers would use an automated way of generating and adding IDs most of the time -- as we do now.

I understand the desire to make this as simple/easy as possible for writers. At the same time, I think we can give writers some credit. This would be a clear rule with no ambiguity -- a context variable is separated using a special char. Writers contend with stuff like that all the time.

AsciiDoc compiles docs differently if you mistake any of the following:

: for ::
' for "
[ for { or (
= for ==
- for =
. for .
etc.

In other words, writers need to be mindful of what their doing. This underscore vs dash thing seems to be taken out of proportion. Yes, it's possible to mistake one for another, but so are many other formatting elements. I don't think we need to be worried about this one.

ncbaratta commented 5 years ago

I'd think that the majority of writers would use an automated way of generating and adding IDs most of the time -- as we do now.

@rkratky What is this automated way you speak of? I'm pretty sure our team is doing IDs manually - not in an automated fashion.

leswilliams44 commented 5 years ago

@vikram-redhat made a lot of good points, as has everyone else. Using an underscore as a separator seems the least problematic of all solutions at this point.

rkratky commented 5 years ago

@ncbaratta, @jenmalloy, I use an obscure text editor, so my own solution would be of little use to you. But many writers use Atom, I believe, so here's a quick & dirty hack to automate ID creation in Atom:

Atom allows you add custom commands. The easiest way (I think... Atom gurus, correct me) is to define your command directly in the user's init.coffee file:

Go to Edit > Init Script..., which opens the init.coffee file in a new tab (the file is normally in ~/.atom/init.coffee on your filesystem).

Add the following code to the init.coffee file:

atom.commands.add 'atom-text-editor', 'custom:add-mod-docs-id', ->
editor = atom.workspace.getActiveTextEditor()
cursor = editor.getCursors()[0]
heading = editor.lineTextForBufferRow( cursor.getBufferRow() )
editor.deleteLine()
id = heading
id = id.toLowerCase().
      replace(/[^a-zA-Z0-9]/g, "-").
      replace(/-+/g, "-").
      replace(/^-|-$/g, "").
      replace(/^.*$/, "[id=\"$&_{context}\"]\n" + heading + "\n")
editor.insertText( id )

Create a keyboard shortcut for your new command. Go to Edit > Keymap... (opens your keymap.cson file), and add the following code (this assumes the you don't have the Ctrl + Shift + d keyboard shortcut assigned to any other command; it doesn't seem to be by default):
```
'atom-text-editor':
'ctrl-shift-d': 'custom:add-mod-docs-id'
```
Reload your editor environment by pressing Ctrl + Shift + F5.
In an AsciiDoc file, position your text cursor anywhere on a line with a heading, and press Ctrl + Shift + d.

If all went well, you should see an ID (with an underscore :)) appear above the heading.

Note: At this point, the command is rather stupid, so it doesn't check:

whether the line you're on in your .adoc file is actually a heading. If you press the Ctrl + Shift + d shortcut in the middle of a paragraph, it will create a monster of an ID as if the line was a heading
whether an ID already exists above the heading; it creates a new one even if there already is one

As I mentioned, I don't use Atom, and I just threw this together now. So, not only can it be made smarter -- there might be also ways to make the scriptlet simpler and/or faster. I'm sure Atom users might have better suggestions.

I based this on my use case -- I only need to add IDs occasionally. But it would be easy to create a similar command to add IDs for all headings in a file. Let me know if that would be useful.

VladimirSlavik commented 5 years ago

+1 to underscore, because it works now and I need to debug actual stuff in bulk.

vikram-redhat commented 5 years ago

I believe we have broad consensus on this. @adahms - is there anyone else we should consult, or can we go forward with this? We should probably announce it to the wider CCS list as well.

@kalexand-rh - do you mind undoing your previous PR (https://github.com/redhat-documentation/modular-docs/pull/92) to make this change?

adahms commented 5 years ago

@vikram-redhat - I agree; comments on the thread have slowed down a little, and the general vote seems to be in favor of underscores. No solution will be completely perfect, but based on the conversation above, I agree that this looks like the best way to go.

I'll follow up on the thread on the mailing list when the change has been made, and have created a basic script in Python that can make the updates in bulk.

tradej commented 5 years ago

@adahms:

@tradej - During the PoCs open hours sessions where this was first brought up, you mentioned it might be something that can go away once we have the system in place, and that we can find a more elegant solution there. Does that still sound true?

I haven't got time to check with the Tooling Team about the progress on the linking mechanism, but I don't know of anything that would change the situation—replacing anchor IDs with underscores with something else en masse should be fairly simple with a few scripts, yes.

I'll follow up on the thread on the mailing list when the change has been made, and have created a basic script in Python that can make the updates in bulk.

Thank you, it's much appreciated.

fbolton commented 4 years ago

@adahms @tradej The existing reference guide still gives the out-of-date guidance to use hyphen instead of underscore: https://github.com/redhat-documentation/modular-docs/blob/master/modular-docs-manual/content/topics/module_anchor-and-file-names-concept.adoc Which has caused our team quite a bit of confusion. Should I go ahead and submit an MR to fix this?

adahms commented 4 years ago

Hi @fbolton - thanks for the follow up! Good call, and sounds good to me. As soon as a PR is in place, I'd be happy to take a look and help pass it through.

fbolton commented 4 years ago

@adahms Seems that I do not have permission to push a topic branch to the modular-docs repo, presumably because I am not on the modular-docs team: https://github.com/orgs/redhat-documentation/teams/modular-docs/members Can you add me to the team?

adahms commented 4 years ago

@fbolton - Thanks for letting me know, Fintan. I've sent an invite to add you to the group for this repo, and as soon as you accept that, you should be good to go. Let me know if you run into any issues, and I'll take a look.

fbolton commented 4 years ago

@adahms Thanks! I just created PR #108 for this issue.

adahms commented 4 years ago

Thanks all - #108 has now been merged, so I am hoping we can officially call this one closed.

Closing this issue now. :)

redhat-documentation / modular-docs

All hyphens in anchor ID make debugging and automated operations more difficult #94