withastro / astro

The web framework for content-driven websites. ⭐️ Star to support our work!
https://astro.build
Other
45.6k stars 2.39k forks source link

Special characters excluded from content collection slug #11458

Closed nicolabovolato closed 1 month ago

nicolabovolato commented 2 months ago

Astro Info

Astro                    v4.11.5
Node                     v20.15.0
System                   macOS (arm64)
Package Manager          pnpm
Output                   static
Adapter                  none
Integrations             @astrojs/preact
                         @astrojs/tailwind
                         @astrojs/mdx
                         @astrojs/partytown
                         @astrojs/sitemap

If this issue only occurs in one browser, which browser is a problem?

No response

Describe the Bug

On my personal website there's among others a basic tutorial for C and one for C++.

I'm currently migrating from pages to content collections, but il looks like the + sign gets stripped in the slug.

09:13:48 [ERROR] [DuplicateContentEntrySlugError] blog contains multiple entries with the same slug: `it/tutorial-c/introduzione`. Slugs must be unique.
Entries: 
- /src/content/blog/it/tutorial-c++/introduzione/index.mdx
- /src/content/blog/it/tutorial-c/introduzione/index.mdx
  Error reference:
    https://docs.astro.build/en/reference/errors/duplicate-content-entry-slug-error/
  Stack trace:
    at REDACTED/node_modules/astro/dist/content/vite-plugin-content-virtual-mod.js:213:19

What's the expected result?

From what I found in the docs, there's no mention of any character stripping when generating the slug. I think these character should be kept as it also provides better compatibility with Astro pages.

Link to Minimal Reproducible Example

https://stackblitz.com/edit/github-ywpupp

Participation

bgentry commented 2 months ago

I believe I actually just ran into this as well. I'm trying to build a content collection for a software library that includes versioned URLs (so I can have docs for all versions). A pkg collection with content files like src/content/pkg/v0.0.1/sub/index.mdoc will have its slug returned as v001/sub. I can of course override that slug back to its "true" form of v0.0.1/sub, but I'll have to do that for every single file—definitely not ideal!

I would love to be able to opt out of sanitizing actually-supported characters out of my slugs in some fashion.

matthewp commented 1 month ago

You can set your own custom slug on individual entries: https://docs.astro.build/en/guides/content-collections/#defining-custom-slugs

I don't think we want to change our default slugging algorithm.

bgentry commented 1 month ago

Wasn’t suggesting altering any defaults, and yes I’m aware that the slug can be customized (I mentioned above that I understood this). The problem is in my case I have to do that for every single file, all to work around a slug algorithm that really isn’t documented and certainly isn’t customizable.

Ideally I would be able to customize this behavior globally or within a collection so that I don’t need to override the frontmatter in every single file (which I’m having to do in a script due to my content being autogenerated).

matthewp commented 1 month ago

A global config way of doing it could work. Do you have any suggestions as to what that would look like?

bgentry commented 1 month ago

Conceptually it seems like the slug transformer takes a file path string as input, and returns a transformed string as output. If the current logic can be easily wrapped into a function of that form, then a collection level slugTransform or slugTransformer or slugNormalize setting of that form would work. I am not too opinionated about the naming here, you would probably have better ideas about how to fit it in!

matthewp commented 1 month ago

slugTransform probably fits with how we tend to name things. Any chance you'd be able to contribute this change?

bgentry commented 1 month ago

I wish I could, but I have a new baby coming any day now and I’m spread very thin atm 😅 If things settle down during my leave I may be able to pick it up in a month or two.

I think the proposal sounds great though. This would really make the content collections a lot more flexible for different use cases imo.

matthewp commented 1 month ago

cc @ascorbic how does content layer treat slugs? Is it still a special thing? Is this type of request something that could be incorporated there?

ematipico commented 1 month ago

It's already handled. The glob loader provide a generatedId function that allows to customise the slug of the content collections

ascorbic commented 1 month ago

It uses IDs instead of slugs, but you can provide a generateID function to glob to take over generating the ID, which would let you handle that kind of thing.

https://github.com/withastro/astro/blob/content-layer/packages/astro/src/content/loaders/glob.ts#L30

matthewp commented 1 month ago

Ok, in that case we probably won't add an option for the v1 CC and encourage people to switch to the new API.

bgentry commented 1 month ago

Amazing! I'll be eagerly awaiting the new content layer / #11360 🚀 :pray:

matthewp commented 1 month ago

Going to close as there's nothing actionable to do here at this time.

nicolabovolato commented 3 weeks ago

Am I missing something?

https://stackblitz.com/edit/github-ywpupp-dkraoo

@matthewp @ascorbic

ematipico commented 3 weeks ago

@nicolabovolato

Content layer solves this problem. The current content collections we will slowly be phased out

nicolabovolato commented 3 weeks ago

@ematipico

I'm guessing that it truly is experimental then 😄

ascorbic commented 3 weeks ago

@nicolabovolato you need to move your files out of src/content. Right now they're being processed using the old content collections.

nicolabovolato commented 3 weeks ago

Thanks @ascorbic, working fine now 🙌🏼