TOC gives errors (caused by anchor tags without an `href` attribute)

tyeeman commented 1 year ago

Hi again. I'm running your unmodified (by me) version and I'm getting errors when the .md file has a TOC. Here is part of the TOC -

<!-- TOC start -->
- [**Introduction - August 2022**](#introduction-august-2022)
- [**Upcoming SISKA Events**](#upcoming-siska-events)
- [**Upcoming NonSISKA Events**](#upcoming-nonsiska-events)
- [**Community Events of Interest**](#community-events-of-interest)
- [**Upcoming Courses**](#upcoming-courses)

Here is the error -

md2blog --clean --serve
←[0m←[1m←[31merror←[0m: TypeError: Cannot read properties of undefined (reading 'split')
    at file://$deno$/bundle.js:7162:66
    at GoldsmithObject.run (file://$deno$/bundle.js:1457:28)
    at async GoldsmithObject.build (file://$deno$/bundle.js:1493:23)
    at async file://$deno$/bundle.js:55045:1

Anything come to mind?

jaredkrinke commented 1 year ago

Not sure why I didn't see this until now, but I'll take a look. That is probably a bug in my code...

jaredkrinke commented 1 year ago

Just for my own understanding, how are you creating the actual anchor tag?

I tested using a Markdown heading (e.g. # Heading), but I could only reproduce this error if I used inline HTML (e.g. <a id="introduction-august-2022">test</a>) -- there are probably more approaches that I haven't thought of, thus my question. Side note: # Introduction - August 2022 produces an anchor with id #introduction---august-2022 (note the extra hyphens), in case you want to switch to using Markdown heading.

jaredkrinke commented 1 year ago

Not to self: transferring issues apparently actually moves them. I probably should have just opened a new bug on Goldsmith and linked them somehow.

jaredkrinke commented 1 year ago

This was due to a JavaScript gotcha: testing null or undefined using RegExp.test(...) coerces those values to the literal strings "null" or "undefined" which is surprising, to say the least. Glad to see that MDN notes this behavior.

tyeeman commented 1 year ago

I'm using another dev's repo to run my markdown through to get the TOC.

Here it is - https://github.com/derlin/bitdowntoc

tyeeman commented 1 year ago

Yes I was using anchors. Looks like there is an option for no anchors.

jaredkrinke commented 1 year ago

@tyeeman you are using the md2blog code directly from the Git repository, correct?

I just pushed a fix, so if you do a git pull for md2blog it should pick up the fix and avoid this bug. I commented earlier, but just to confirm: this was a bug in my code, so I'd suggest picking up the fix rather than changing your workflow.

If you're not using the Git repository and instead relying on md2blog builds, unfortunately I haven't built updated binaries yet. If that's the case, let me know!

Edit: and thanks for the bug report!

tyeeman commented 1 year ago

I'm using the binary.

jaredkrinke commented 1 year ago

I'm using the binary.

Ok, binaries have been updated: https://github.com/jaredkrinke/md2blog/releases/tag/1.2.0

tyeeman commented 1 year ago

Just ran the binary and getting error -

2023-10-09_170825

jaredkrinke commented 1 year ago

Just ran the binary and getting error -

That error looks like a Deno bug, but I'll need to investigate to be certain.

jaredkrinke commented 1 year ago

Just ran the binary and getting error -

For context, which version of Windows was this on?

tyeeman commented 1 year ago

Windows 7 64 bit

jaredkrinke commented 1 year ago

Windows 7 64 bit

Uh oh, I've got some bad news: Deno (which md2blog is built on) doesn't officially support Windows 7: https://github.com/denoland/deno_install/issues/181#issuecomment-792102833

The reason this showed up now is that I updated my copy of Deno and used that to build the binaries. Previously, I must have been using a version of Deno that worked on Win7 (possibly unintentionally).

I don't recall why I updated Deno. If there wasn't a good reason to update, it might be possible to downgrade and then produce a new set of binaries that have a better chance of working on Win7.

tyeeman commented 1 year ago

Yes Win7 is starting to have problems with a few languages now. I've got a bunch of new PC parts a few weeks ago, ready to put together for my new computer. I guess I should build it. It's up to you if you would like to downgrade deno. It would be nice so I could get a site working but I understand that Win7 is not really supported anymore.

My other option is to use the old version of md2blog and create the TOC using href's. That should work, yes?

Another option is for me to compile the new code with the old version of deno, if I had the .exe.

jaredkrinke commented 1 year ago

Yes Win7 is starting to have problems with a few languages now. I've got a bunch of new PC parts a few weeks ago, ready to put together for my new computer. I guess I should build it. It's up to you if you would like to downgrade deno. It would be nice so I could get a site working but I understand that Win7 is not really supported anymore.

No guarantees this will work, but I attempted to create an executable using the same version of Deno I had used for the previous release. Assuming the previous release worked on Win7, I think this one will work just as well:

https://github.com/jaredkrinke/md2blog/releases/download/1.2.0/md2blog-x86_64-pc-windows-msvc-win7.zip

My other option is to use the old version of md2blog and create the TOC using href's. That should work, yes?

Yeah, the bug is if you have an a tag without href (or img without src), an exception will be thrown and abort the build.

Edit to add: the Deno version I believe I was using for the previous release (based on file metadata) was 1.16.4.

tyeeman commented 1 year ago

Thanks, I'll try it now. Update - Yes, it works with no href''s but I get this -

md2blog --clean --serve
Watch: monitoring: [content]...
Serve: listening on: http://localhost:8888/
Error: The site has broken relative links:

Is that a false error? How could I test if there actually is a broken relative link?

jaredkrinke commented 1 year ago

It doesn't list any links? That... doesn't seem possible, at a glance...

tyeeman commented 1 year ago

No, I don't see anything wrong. Here is the front page -

2023-10-09_195126

although I don't know why the title of the two posts show up in the Topics list??

jaredkrinke commented 1 year ago

So the output really doesn't show anything like the last line in this example?

Watch: monitoring: [content]...
Serve: listening on: http://localhost:8888/
Error: The site has broken relative links:

From "posts/lisp/pathnames-on-windows.html" to "#foo"

There should always be some output indicating the links that are broken (edit: note that this code is only run when brokenLinks.length > 0):

export class GoldsmithLinkCheckerError extends Error {
    brokenLinks: BrokenLink[] = [];

    constructor(brokenLinks: BrokenLink[]) {
        super(`The site has broken relative links:\n\n${brokenLinks.map(bl => `From "${bl.filePath}" to "${bl.href}"`).join("\n")}`); // <---- Right here
        this.brokenLinks = brokenLinks;
    }
}

As far as post titles in topics, are those actually directory names? md2blog infers topics from the directory name, and then also uses explicit keywords set it front matter.

tyeeman commented 1 year ago

Yes, there is output with the errors, I missed that. Here they are -

md2blog --clean --serve
Watch: monitoring: [content]...
Serve: listening on: http://localhost:8888/
Error: The site has broken relative links:

From "posts/august-2022/august-2022.html" to "#kayaking-bcs-central-coast-july-7-24-preliminary-repo
rt"
From "posts/august-2022/august-2022.html" to "#kayaking-in-the-rocky-point-area-navwarns-site"

And yes, they are directory names.

Update - I click on those 2 links in the TOC and they take me to the correct spot in the document.

jaredkrinke commented 1 year ago

Update - I click on those 2 links in the TOC and they take me to the correct spot in the document.

Ah, I misunderstood. So the links are getting flagged as broken, but they appear to not actually be broken?

That's pretty surprising, but probably not impossible. I'd need a copy of the Markdown or HTML files to investigate. The logic seems sound, but HTML can be weird. It's basically doing this:

Parse each HTML file, looking for opening tags
Note down any id attributes that are seen
Later, parse each HTML file again and look for a tags with href (and img+src), and then check to see if the target id existed from the previous two steps

tyeeman commented 1 year ago

Yes, I don't see any errors. Currently I'm looking for instructions on how to make Pandoc add a TOC to a markdown file. I use it every month to make an email version of my newsletter but it uses a different template. It adds href's and id's to h1 and h2's. I may not get an errors if I try Pandoc or maybe it's not related?

jaredkrinke commented 1 year ago

Yes, I don't see any errors. Currently I'm looking for instructions on how to make Pandoc add a TOC to a markdown file. I use it every month to make an email version of my newsletter but it uses a different template. It adds href's and id's to h1 and h2's. I may not get an errors if I try Pandoc or maybe it's not related?

I bet what's happening is that your TOC generator is using the name attribute instead of the id attribute. I confirmed my code doesn't handle that case but that Firefox scrolls to the line successfully, so... I guess it's valid?

This is probably a misunderstanding on my part that I should fix, but I can't tell for sure yet.

tyeeman commented 1 year ago

Yes, that generator I mentioned makes this -

<!-- TOC --><a name="introduction"></a>

I asked the dev why no id's, but they didn't seem to want to change until "name" stopped working everywhere.

jaredkrinke commented 1 year ago

I asked why no id's, but they didn't seem to want to change until "name" stopped working everywhere.

Well, it seems to be deprecated as of HTML 5 (for a): https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#name

So I'd say we're both wrong: they should probably not use it (unless targeting HTML 4), but since it works in browsers, I probably should support it (i.e. not flag links as broken in this scenario).

jaredkrinke commented 1 year ago

I opened #5 to track this additional issue. Note that I won't be able to get to fixing it today for sure. For now, you can ignore those errors (although I'm sure there will be enough that it's annoying).

tyeeman commented 1 year ago

OK, I'll keep working on Pandoc ID's and let you know if I get no broken link errors.

jaredkrinke commented 1 year ago

Update - I click on those 2 links in the TOC and they take me to the correct spot in the document.

Update: I went ahead and added support for using either name or id (giving preference to id) when checking links. Hopefully this will resolve your issue:

https://github.com/jaredkrinke/md2blog/releases/download/1.2.1/md2blog-x86_64-pc-windows-msvc.zip

tyeeman commented 1 year ago

Thanks, I'll try it this evening. I tried Pandoc to create the toc but it just creates the menu, it doesn't add id's to the headings when going from markdown to markdown. When building to html then clicking on a menu item, firefox does nothing, so what app would jump to the proper heading without an id or name?

tyeeman commented 1 year ago

Yes, it's working perfectly with no errors! Thanks for all your help. I see that you're adding id's. Very nice. I wonder what Firefox is using for the jumps, id's or name? I see you added a goldsmith issue about adding a toc. That would mean I wouldn't have to run my markdown through bitdowntoc first. That would be nice.

Do you know why lots of these toc apps don't add "id's" or "name" to the headings themselves (in markdown files) and they're expected to work once converted to html? Actually Pandoc does add the id's when converting from markdown to html, but it doesn't when going from markdown to markdown like I mentioned above. I guess they expect the html converter to do it but lots of them don't. That's why I used bitdowntoc on my files. Even though it only uses "name" it works in Firefox and I guess other browsers. No one who receives my newsletter has told me it doesn't.

jaredkrinke commented 1 year ago

Do you know why lots of these toc apps don't add "id's" or "name" to the headings themselves (in markdown files) and they're expected to work once converted to html? Actually Pandoc does add the id's when converting from markdown to html, but it doesn't when going from markdown to markdown like I mentioned above. I guess they expect the html converter to do it but lots of them don't. That's why I used bitdowntoc on my files. Even though it only uses "name" it works in Firefox and I guess other browsers. No one who receives my newsletter has told me it doesn't.

Good question! Short answer: I don't know. I assumed adding ids to headings was part of the Markdown spec (or CommonMark or GitHub Flavored Markdown), but I can't find the behavior specified... anywhere.

md2blog uses Marked.js for Markdown -> HTML, and it inserts ids for sure--I've been relying on it--but the specific rules (especially for non-English languages) don't seem to be documented.

jaredkrinke / goldsmith

TOC gives errors (caused by anchor tags without an `href` attribute) #3