platers / obsidian-linter

An Obsidian plugin that formats and styles your notes with a focus on configurability and extensibility.
https://platers.github.io/obsidian-linter/
MIT License
1.13k stars 79 forks source link

FR: automatically capitalize first letters of sentences #574

Open GamerGirlandCo opened 1 year ago

GamerGirlandCo commented 1 year ago

Is Your Feature Request Related to a Problem? Please Describe.

i want a simpler way to convert a file that's in all lowercase to sentence case. this is because my writing style is very stream-of-consciousness oriented, and as such, i don't want to get bogged down with semantic things like capitalization in the moment.

Describe the Solution You'd Like

A linter step that capitalizes the first letter of every sentence.

Please include an example where applicable:

*(what a joke you are. it's a wonder anyone listens to the garbage you write... except you wrote nothing because you're talentless.)*

Sean clenched his fist and swallowed the lump in his throat. this wasn't the time to cry.

the sound of floorboards creaking startled the singer. he pivoted to look behind him, preparing to unleash verbal hell onto whomever thought it would be a good idea to follow him. though, much to his relief, nobody was there. *just my imagination.*

*(why worry about being followed anyways? Brian doesn't care. he’s too busy laughing at you behind your back. Because you're PATHETIC.)*

on and on the voice prattled as he ascended the stairs. Sean ignored it for the most part, hoping ‘it’ would somehow “get the hint” and go away, yet knowing deep down it likely wouldn't.

when Sean had gotten to the second floor, he looked behind him one last time to be absolutely sure nobody followed him upstairs. then, he dashed into the bathroom and locked the door.
*(What a joke you are. It's a wonder anyone listens to the garbage you write... except you wrote nothing because you're talentless.)*

Sean clenched his fist and swallowed the lump in his throat. This wasn't the time to cry.

The sound of floorboards creaking startled the singer. He pivoted to look behind him, preparing to unleash verbal hell onto whomever thought it would be a good idea to follow him. Though, much to his relief, nobody was there. *Just my imagination.*

*(Why worry about being followed anyways? Brian doesn't care. He’s too busy laughing at you behind your back. Because you're PATHETIC.)*

On and on the voice prattled as he ascended the stairs. Sean ignored it for the most part, hoping ‘it’ would somehow “get the hint” and go away, yet knowing deep down it likely wouldn't.

When Sean had gotten to the second floor, he looked behind him one last time to be absolutely sure nobody followed him upstairs. then, he dashed into the bathroom and locked the door.

Describe Alternatives You've Considered

as of now, i use the obsidian regex pipeline plugin with rules generated by the following javascript code:

const fs = require("fs")
const abc =  "abcdefghijklmnopqrstuvwxyz".split("")
fs.writeFileSync("bracket",abc.map(a => `"^(\\*{0,3})(\\(|“)(\\*{1,3})${a}"->"$1$2$3${a.toUpperCase()}"`).join("\u000a"))
fs.writeFileSync("formata", abc.map(a => `"^(\\*{1,3})${a}(\\W?)"->"$1${a.toUpperCase()}$2"`).join("\u000a"))
fs.writeFileSync("formata2", abc.map(a => `"(\\.\\*{0,3}\\s)${a}"->"$1${a.toUpperCase()}"`).join("\u000a"))
fs.writeFileSync("quotandothr", abc.map(a => `"(\\.\\*{0,3}\\s)(\\*{1,3}|“\\*{0,3})${a}"->"$1$2${a.toUpperCase()}"`).join("\u000a"))

this outputs:

"^(\*{0,3})(\(|“)(\*{1,3})a"->"$1$2$3A"
"^(\*{0,3})(\(|“)(\*{1,3})b"->"$1$2$3B"
"^(\*{0,3})(\(|“)(\*{1,3})c"->"$1$2$3C"

... and so on for every letter of the alphabet.

Additional Context

i did the work of putting together a snippet of code that does exactly what i want by putting the regexes from the above code into a capture group separated by |, with some slight additions and tweaks:

str.replace(
  /(\.\*{0,3}\s\*{1,3}|[“—]\*{0,3}|[“—!]\s|\.\*{0,3}\s?|^\*{1,3}|^\*{0,3}[(“]{0,2}\*{0,3}|^|\)\s)([a-z])(\W?)/gm, 
  (_, u1, u2, u3) => u1 + u2.toLocaleUpperCase() + u3
)
j-adel commented 1 year ago

I would also love a rule like this. Is there a way to at least use the Regex feature to insert a rule that finds and replaces the lowercase letters? This code here seems to be do that but I'm not sure how it can be implemented in the regex replace feature

pjkaufman commented 1 year ago

It is not possible to use regex find and replace to capitalize the first letter in a sentence without some kind of other piece of logic like in the example code provided. JS does not support uppercasing a character via regex. So if this were to become a feature it would need to be a rule.

pjkaufman commented 1 year ago

The amount of capture groups present makes me wonder if the regex is performant, but then again I have seen a lot worse looking regex that is decently performant.

pjkaufman commented 1 year ago

My understanding is that the regex means the following, but I am by no means an expert: The first capture group is made up of the following It matches a period followed by 0 to 3 asterisks a whitespace character and then 1 to 3 asterisk: \.\*{0,3}\s\*{1,3} Or a double quote or em dash (?) followed by 0 to 3 asterisks [“—]\*{0,3} Or a double quote or em dash (?) or exclamation mark followed by a whitespace character: [“—!]\s Or a period followed by 0 to 3 asterisks followed by 0 or 1 whitespace characters: \.\*{0,3}\s? Or the start of the line is 1 to 3 asterisk: ^\*{1,3} Or the start of the line has 0 to 3 asterisks followed by anywhere from 0 to 2 of either opening parentheses or double quotes followed by 0 to 3 asterisks: ^\*{0,3}[(“]{0,2}\*{0,3} Or a closing parentheses followed by a whitespace character: \)\s Or the start of a line: ^

Group 2 is any uncapitalized character: [a-z]

Group 3 is 0 or 1 number, letter, or underscore

I am assuming this regex was generated for a very particular scenario as it seems to leave off several different kinds of punctuation. It looks like it would be a good start for a rule like this, but I find it hard to see how this would work with the markdown syntax as it is.

Based on how error prone capitalizing a sentence can be I am little hesitant to try doing so in this case. However if I can better understand how we would avoid the problems that arise from having to try to determine sentence capitalization with markdown syntax in the mix, this could be something we move forward with.

j-adel commented 1 year ago

Hmm I didn't realize how complicated sentence capitalization was. But honestly you don't have to start with an encapsulating rule that covers everything, conservative progressive development would be good enough IMO. I'll try to look if there's any open implementation of this on the web :D

GamerGirlandCo commented 1 year ago

Group 3 is 0 or 1 number, letter, or underscore

actually, the \W in group 3 means "any non-word character" (i.e. anything besides letters, dashes - and underscores _).

I am assuming this regex was generated for a very particular scenario as it seems to leave off several different kinds of punctuation.

i have no problems adding more punctuation, if that's what you'd like! for example, i noticed now that question marks aren't part of the detected punctuation.

It looks like it would be a good start for a rule like this, but I find it hard to see how this would work with the markdown syntax as it is.

that's what the asterisk rules (\*{0,3} et al) are for -- to make it work with markdown syntax.