jmclawson / biblatex-mla

MLA-style citations and bibliographies using Biblatex
23 stars 9 forks source link

Issues with url output #35

Open solonovamax opened 9 months ago

solonovamax commented 9 months ago

Hi, I have found a few issues with the current output of biblatex-mla to do with urls.

For context, here is what the MLA Handbook 9th Edition notes regarding urls:

[5.94] Permalinks

If your source offers a URL that it identifies as stable, permanent, or persistent (sometimes called a permalink), use it in your entry instead of the URL that appears in your browser, and copy it directly from the source (see fig. 5.88).[5.95] URLs URLs have a few basic components (fig. 5.90):

  • the protocol (what precedes //)
  • the double forward slash
  • the host (which encompasses the domain—like www)
  • the path

[5.95] URLs

URLs have a few basic components (fig. 5.90):

Fig. 5.90. The components of a URL.

Fig. 5.90. The components of a URL.

In addition, sometimes file-specific information or a query string is appended.  https://style.mla.org/files/2016/04/practice-template.pdfhttps://www.mla.org/search/?query=pmla When including a URL, copy it in full from your browser. Omit a query string when possible. Some URLs display www but others do not.

[5.96] Truncating

You can usually omit http:// or https:// from URLs unless you want to hyperlink them and are working in a software program that does not allow hyperlinking without the protocol (but include https:// with DOIs). In professionally designed and typeset fixed-format works like print and PDF, the protocol can always be omitted. If a URL runs more than three full lines or is longer than the rest of the entry, truncate it. When truncating, always retain at least the host. For example, the following URL runs more than three full lines: image It could be shortened to the following:  go.galegroup.com/ps Avoid citing URLs produced by shortening services (like bit.ly), since they obfuscate information when not clickable (as in a print paper) and since such a URL may stop working if the service that produced it disappears.

[5.97] Breaking

When giving a URL in your paper, never introduce a hyphen or space in it (turn off your word processing software’s automatic hyphenation feature). Do not worry about uneven line breaks: the accurate display of the URL is more important than its appearance. Professionally typeset publications in fixed formats, like print or PDF, normally follow rigorous conventions for breaking URLs to avoid ambiguity or uneven line breaks.

[5.98] Including terminal slash

Whether omitting the terminal slash—a forward slash at the end of a URL—will disable the link depends on how the URL is set up. The most cautious approach for writers is to test the link with and without the slash and use the shortest form that works (i.e., use the slash-free URL if it works). When editing a work, do not delete or add a slash; use what the writer has provided.

(MLA Handbook: Ninth Edition 245-247)

Note: formatting was kept as similar to the original source as is possible within markdown's restrictions

Option to include the https:// component in the resulting url

Currently, biblatex-mla will rewrite the url to omit the https:// component.

The MLA handbook says:

You can usually omit http:// or https:// from URLs unless you want to hyperlink them and are working in a software program that does not allow hyperlinking without the protocol (but include https:// with DOIs). In professionally designed and typeset fixed-format works like print and PDF, the protocol can always be omitted.

My interpretation of the world "can" is that it is optional to omit the https:// component, but not mandatory. As such, it would be nice to include an option to explicitly enable the inclusion of https:// (disabled by default).

Long URLS

The MLA handbook says:

[5.96] Truncating

You can usually omit http:// or https:// from URLs unless you want to hyperlink them and are working in a software program that does not allow hyperlinking without the protocol (but include https:// with DOIs). In professionally designed and typeset fixed-format works like print and PDF, the protocol can always be omitted. If a URL runs more than three full lines or is longer than the rest of the entry, truncate it. When truncating, always retain at least the host. For example, the following URL runs more than three full lines: image It could be shortened to the following:  go.galegroup.com/ps Avoid citing URLs produced by shortening services (like bit.ly), since they obfuscate information when not clickable (as in a print paper) and since such a URL may stop working if the service that produced it disappears.

[5.97] Breaking

When giving a URL in your paper, never introduce a hyphen or space in it (turn off your word processing software’s automatic hyphenation feature). Do not worry about uneven line breaks: the accurate display of the URL is more important than its appearance. Professionally typeset publications in fixed formats, like print or PDF, normally follow rigorous conventions for breaking URLs to avoid ambiguity or uneven line breaks.

Currently, if biblatex attempts to render a url, and the url exceeds 3 lines, it will just continue rendering it. However, it should instead truncate it by removing first the query parameters and then attempt to trim the path if it is still too long.

Also, the url will go off the page if it is too long with no places that latex feels it can insert a line break. Instead, latex should be willing to insert a line break at any character, and not insert any dashes.

RankkaApina commented 7 months ago

I have an issue related to this (I thought I'd add a comment instead of a new thread): I am using mla-strict, to get the http/https to show in the bibliography. Something in this breaks the hyperref linking though. If I try with mla style, I can open the url, no problem. When I use mla-strict, the url looks correct in the bibliography, but the link is missing the semicolon, i.e. the opening link starts with https// instead of https://. This results the browser also adding http:// in the front, so then the address starts http://https// which leads to nowhere.