retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.28k stars 284 forks source link

Lowercase "A" in BBT Sentence Case #2078

Closed bwiernik closed 2 years ago

bwiernik commented 2 years ago

Support log: I9TH4EA5-refs-euc

BBT Sentence Case doesn't change the case of single uppercase letters like "C" or "R". However, "A" usually is the English article, rather than a proper noun. So, could "A" also be lowercased when the function is run:

Example: insight: A Unified Interface to Access Information from Model Objects in R

Expected result: insight: a unified interface to access information from model objects in R

retorquere commented 2 years ago

Is this for import, or for the right-click option? I see now that I have different implementations for those, I'm going to unify them, but it'd help to know what your baseline is.

retorquere commented 2 years ago

BTW thank you for including a debug log.

bwiernik commented 2 years ago

I was thinking in the right click option

qqobb commented 2 years ago

It seems that BBT sentence case is keeping single-character words capitalized following #1780. You might want to keep "vitamin A" unchanged. (Issue #1742 might also be related.)

Example: insight: A Unified Interface to Access Information from Model Objects in R

I guess "A" is capitalized here following the APA Style title case:

In title case, capitalize the following words in a title or heading:

  • the first word of the title or heading, even if it is a minor word such as “The” or “A”

So perhaps BBT sentence case could check for such cases. However, keeping "A" capitalized might also be ok. See the example at the end of this APA Style guide from September 2019.

bwiernik commented 2 years ago

APA style guide is irrelevant—Zotero applies APA sentence case as needed. An initial A at the beginning of the subtitle is the most common case, much more common than vitamin A, so making A an exception to the single letter upper case rule is best

github-actions[bot] commented 2 years ago

:robot: this is your friendly neighborhood build bot announcing test build 6.4.3.2432 ("sometimes simpler is better")

Install in Zotero by downloading test build 6.4.3.2432, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

bwiernik commented 2 years ago

All all-caps words are now being lowercased, rather than just A.

For example Type B Personality: A Meta-Analysis becomes Type b personality: a meta-analysis rather than Type B personality: a meta-analysis.

And Structured Interviewing for OCB: Construct Validity, Faking, and the Effects of Question Type becomes Structured interviewing for ocb: construct validity, faking, and the effects of question type rather than Structured interviewing for OCB: construct validity, faking, and the effects of question type

Two sample items: SJ6CUYRJ-refs-euc

qqobb commented 2 years ago

I guess the change suggested here is to replace : A with : a. All single-character A's in the middle of a sentence could be kept unchanged. So add something like

title.replace(/: A /g, `: a `);
bwiernik commented 2 years ago

No, it really should be any single A. In nearly all cases, an uppercase A is an error

github-actions[bot] commented 2 years ago

:robot: this is your friendly neighborhood build bot announcing test build 6.4.3.2435 ("new cases for sentence-caser")

Install in Zotero by downloading test build 6.4.3.2435, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

qqobb commented 2 years ago

No, it really should be any single A. In nearly all cases, an uppercase A is an error

There's no title case rule listed here that capitalizes an article "a" in the middle of a sentence. If you're dealing with erroneous title casing to start with, you could apply Zotero's sentence case function that is more aggressive.

I'm OK with this sentence casing:

Type B Personality: A Meta-Analysis
Type B personality: a meta-analysis

But I would then also expect:

Type A Personality: A Meta-Analysis
Type A personality: a meta-analysis

These A's should be preserved in my opinion:

Period after a sentence. A new sentence.

Vitamins A, B, and C.

Hepatitis A and hepatitis B vaccines.

Treatment of hepatitis A. Treatment of hepatitis B.

U S A / U. S. A. / U.S.A. / N.A.S.A. / A.M.
qqobb commented 2 years ago

Well, "A.M." could be turned into "a.m.", but then you'd also end up with "u.s.a.".

retorquere commented 2 years ago

Acronyms that are a repetition of Capital-Period get special treatment in my sentence caser.

DanteSung commented 2 years ago

Acronyms now seem to not function well.

As mentioned in https://github.com/retorquere/zotero-better-bibtex/issues/2123#issuecomment-1104871248, the original title is: Is FFT Fast Enough for Beyond 5g Communications?

Expected: Is FFT Fast Enough for Beyond 5g Communications?

What BBT provides: Is fft fast enough for beyond 5g communications?

The acronym FFT is not considered in the sentence caser.

72XTXMTJ-apse

retorquere commented 2 years ago

What BBT provides: Is fft fast enough for beyond 5g communications?

Not on build 2435

DanteSung commented 2 years ago

What BBT provides: Is fft fast enough for beyond 5g communications?

Not on build 2435

Installed build 2435 provides what I expected Is FFT fast enough for beyond 5g communications?

but build 2447 still goes wrong: Is fft fast enough for beyond 5g communications? 2447 debug id: 5B4THDCR-apse

retorquere commented 2 years ago

Build 2447 was not built on this issue. Separate issues have separate builds, so 2447 does not contain the code present in 2435.

DanteSung commented 2 years ago

Build 2447 was not built on this issue. Separate issues have separate builds, so 2447 does not contain the code present in 2435.

Got it. Would use build 2435 for now. Thanks!

github-actions[bot] commented 2 years ago

:robot: this is your friendly neighborhood build bot announcing test build 6.5.1.2450 ("Merge branch 'master' into gh-2078")

Install in Zotero by downloading test build 6.5.1.2450, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

DanteSung commented 2 years ago

cool. build 2450 would also do the job.

retorquere commented 2 years ago

As soon as we have a consensus here I will release a new version, these builds will auto-update to the formal release.

bwiernik commented 2 years ago

looks good to me

qqobb commented 2 years ago

I tested 2450. This is a title before and after using BBT's title case function:

Q&A: Vitamins A, B, and C. Hepatitis A and hepatitis B vaccines. Treatment of hepatitis A. Treatment of hepatitis B.
Q&a: vitamins a, B, and c. hepatitis a and hepatitis B vaccines. Treatment of hepatitis a. treatment of hepatitis b.

I'd expect the title to remain unchanged.

retorquere commented 2 years ago

I don't really have a strong opinion one way or the other, so I'd prefer it if you guys could come to a consensus. I do not want to introduce a new configuration preference for this though, and given the naiveté of the sentence caser, it is always on the user to inspect & correct the results. I'm not doing any kind of NLP in the sentence-casing.

bwiernik commented 2 years ago

The test qqobb gave is fine. Matching : A and —A and returning : a and —a to lowercase initial A in subtitles but leaving others is fine

retorquere commented 2 years ago

Do you have a sample title for the latter case?

bwiernik commented 2 years ago

insight—A Unified Interface to Access Information from Model Objects in R

retorquere commented 2 years ago

So

Q&A: A Vitamin A, B, and C Study. Hepatitis A and Hepatitis B Vaccines. Treatment of Hepatitis A. Treatment of Hepatitis B.

becomes

Q&A: a vitamin A, B, and C study. Hepatitis A and hepatitis B vaccines. Treatment of hepatitis A. Treatment of hepatitis B.
retorquere commented 2 years ago

Shouldn't em-dashes have spacing around them? insight—A looks like a two-part word.

retorquere commented 2 years ago

Oh wait, em-dashes are not hyphens. Got it.

retorquere commented 2 years ago

So

insight—A Unified Interface to Access Information from Model Objects in R

becomes

insight—a unified interface to access information from model objects in R
bwiernik commented 2 years ago

Yes

retorquere commented 2 years ago

alright, the updated sentencecaser will be in the next release.

retorquere commented 2 years ago
Crossroads, Directions and A New Critical Race Theory

now sentence-cases to

Crossroads, directions and A new critical race theory

acceptable damage?

retorquere commented 2 years ago

Are the other A's than vitamins and hepatitides?

qqobb commented 2 years ago

acceptable damage?

I'd say yes. It's the consequence of an erroneous title case. The title page (inside) of that book shows "Crossroads, Directions and a New Critical Race Theory". You can check this in Google books or Amazon.

Are the other A's than vitamins and hepatitides?

https://pubmed.ncbi.nlm.nih.gov/22628224/ Cholinergic-associated loss of hnRNP-A/B in Alzheimer's disease impairs cortical splicing and cognitive function in mice

https://pubmed.ncbi.nlm.nih.gov/3348967/ Clinical features and course of type A and type B vitiligo

https://pubmed.ncbi.nlm.nih.gov/3779524/ Evaluation of the reversed passive latex agglutination (RPLA) test kits for detection of staphylococcal enterotoxins A, B, C, and D in foods

https://psycnet.apa.org/record/1981-30747-001 Type A behavior, hostility, and coronary atherosclerosis.

https://www.nature.com/articles/ncomms15963 The A-B transition in superfluid helium-3 under confinement in a thin slab geometry

https://dl.acm.org/doi/abs/10.1145/3097983.3097992 Peeking at A/B Tests: Why it matters, and what to do about it

retorquere commented 2 years ago

Alright, then the current state of things is going into the release.

qqobb commented 2 years ago

Matching : A and —A and returning : a and —a to lowercase initial A in subtitles but leaving others is fine

Add a final whitespace, so ": A " becomes ": a " and "—A " becomes and "—a ".

retorquere commented 2 years ago

That's already in.