internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.26k stars 1.4k forks source link

Accept full URLs any place an identifier can be entered #866

Open tfmorris opened 6 years ago

tfmorris commented 6 years ago

Any place that the user is asked to enter an identifier for which OpenLibrary has a URL template, they should be allowed to paste the full URL and have the identifier extracted from it.

UI Changes:

Next Steps:

We should use the URL Match Pattern from Wikidata instead of maintaining it ourselves. See https://www.wikidata.org/wiki/Property:P214 for an example. If we do that this is blocked by: #8236

LeadSongDog commented 6 years ago

From a UX perspective is makes sense, they copypaste an URL from wherever they find the edition. From antispam and data validation perspectives, it might be best to have a separate way of testing that the target page refers to this edition before any overwrite of existing data.

xayhewalo commented 5 years ago

This still isn't implemented. Defaulting to asking @mekarpeles to be assignee for this issue. Note, being the assignee doesn't necessarily mean you are responsible for doing the work, just responsible for gathering/providing information to address the issue. From the Wiki.

The assigned owner is not necessarily the person who will fix the issue (it is not necessarily even established, at that point, if or when the issue will be fixed at all), but rather they are the person who will do as much or as little as needed to handle the issue (asking questions, soliciting input, establishing and updating the priority, checking if it is a duplicate, etc).

Once an issue is labeled State: Work In Progress, the owner is the individual doing the work, or leading/coordinating the group that is doing the work.

I've added labels per context: let me know your thoughts

tfmorris commented 5 years ago

Although the original request was primarily focused on external URLs, this should work for internal OpenLibrary URLs as well. e.g. The author autocomplete field on the work edit form should accept a full OL author URL, in addition to the OL ID that it currently does, although this might be simple enough and distinct enough that it can be addressed separately.

mekarpeles commented 4 years ago

@tfmorris is this for e.g. IDs like goodreads? I know it's for several things, but I think that's part of the problem. Before progress can be made on this (I think it could actually be a good first issue) someone will need to go through the process of providing a clear and reasonable scope and Breakdown.

Right now this issue, as posted, doesn't have a clear end.

@tfmorris are you willing to scope what an acceptable MVP for this issue would be (preferably a checklist where a new contributor could know when they're done)?

LeadSongDog commented 4 years ago

I would interpret the request, as a first step, needing a list of all the relevant identifiers in the UI. That could provide some sense of the scope.

RayBB commented 1 year ago

@tfmorris @cdrini @mekarpeles I've updated this this issue is now updated with a plan that someone could take on.

tfmorris commented 1 year ago

@RayBB Thanks, but all the Wikidata stuff adds, in my opinion, unnecessary complexity and undesirable external coupling. The system already has URL templates for the purposes of converting identifiers into clickable links. It should be possible to leverage those for building extraction regexs.

Freso commented 8 months ago

If it can be any help, MusicBrainz has an extensive blob of JavaScript code that is used to auto‐assign types (and so some cleanup) for URLs (MusicBrainz generally stores URLs rather than IDs): https://github.com/metabrainz/musicbrainz-server/blob/master/root/static/scripts/edit/URLCleanup.js

I know BookBrainz (which does store IDs and not URLs themselves) also has some autodetection, but I am not familiar enough with BookBrainz’s code to know where to find this.