unitedstates / glossary

A glossary for the United States.
Creative Commons Zero v1.0 Universal
42 stars 13 forks source link

Candidate terms #25

Open waldoj opened 10 years ago

waldoj commented 10 years ago

Just in case it might be useful, I've generated a list of every term that is defined at least two times within the Code of Virginia (although I removed every term that contains the named "Virginia"). That's 1,481 in all. They're in this Gist:

https://gist.github.com/waldoj/6969263

After each term is, parenthetically, the number of times that the term is defined, and there's liable to be a correlation between the number of times that a term is defined and the usefulness of defining it in a general legal dictionary. (On the other hand, the more times that a term is defined, one could argue, the harder that it would be to define it in a general legal dictionary, since apparently there are many ways to define it.) Anyhow, it's sorted from most to fewest definitions.

Sorry to make this an issue, but I'm not sure of how better to share this information.

konklone commented 10 years ago

Wow, thanks, @waldoj. I admit, I'm not sure the best way to prioritize these, for exactly the tension you described - but this is very helpful to have on hand. I'll leave it open for someone whose use case is more about legal code to tackle.

gregelin commented 10 years ago

@waldoj Was this the number of times a term was defined (as in "will be defined as..."), or the number of times a term was used?

On Mon, Oct 14, 2013 at 1:02 PM, Eric Mill notifications@github.com wrote:

Wow, thanks, @waldoj https://github.com/waldoj. I admit, I'm not sure the best way to prioritize these, for exactly the tension you described - but this is very helpful to have on hand. I'll leave it open for someone whose use case is more about legal code to tackle.

— Reply to this email directly or view it on GitHubhttps://github.com/unitedstates/glossary/issues/25#issuecomment-26271223 .

waldoj commented 10 years ago

The number of times it was defined, not used.

nickom commented 10 years ago

Awesome! This is definitely a super useful addition and smart way to do it!

Possible next steps: add these terms to the /legal folder with the many definitions and we can get to work combining them into a single universal definition for each one?

konklone commented 10 years ago

I think we'll want to be more careful - if 'person' has been defined over 100 times in Virginia law, it has 100 different legal definitions. For terms with multiple definitions, enumerating them dictionary-style, as @gregelin suggested, is probably the best way, though for 'person' it might require a lot of human judgment on reducing them to a subset.

Also, I think we may have some of these terms already in the import we did from the State Decoded's original legal dictionary?

But either way, I think the terms are something worth going over with a human brain and adding each one as appropriate.

waldoj commented 10 years ago

Possible next steps: add these terms to the /legal folder with the many definitions and we can get to work combining them into a single universal definition for each one?

While I don't object to that, 1,481 terms is a lot of terms to dig through. We'd definitely want to give anybody who might object a chance to voice that objection.

I think I'd make a first pass, deleting any term that is too specific (e.g. "vegetated wetlands of the north landing river and its tributaries") or general (e.g., "home," "animal," "teacher") to be useful, leaving things that are actually liable to be useful (e.g., "tax year," "bailee," "fiduciary"). Then the remainder could be pushed up.

I think we'll want to be more careful - if 'person' has been defined over 100 times in Virginia law, it has 100 different legal definitions. For terms with multiple definitions, enumerating them dictionary-style, as @gregelin suggested, is probably the best way, though for 'person' it might require a lot of human judgment on reducing them to a subset.

Each definition of "person" in Virginia law is a slightly different list of terms. Here's the standard (global) definition:

"Any individual, corporation, partnership, association, cooperative, limited liability company, trust, joint venture, government, political subdivision, or any other legal or commercial entity and any successor, representative, agent, agency, or instrumentality thereof."

A wholly separate definition might simply omit "government" and "political subdivision." This changes the meaning a great deal, but I think it would be pretty easy to boil down all of them to a simple explanation that "person" can include either humans or legal persons, and legal persons can include all manner of corporations, organizations, government entities, and similar groups. Anyhow, yes, you're right that this requires a lot of human judgment, I just want to clarify that the task isn't as onerous as one might suspect.

Also, I think we may have some of these terms already in the import we did from the State Decoded's original legal dictionary?

Sure, that's quite possible. That existing import was based on my own collection of terms from non-copyrighted legal dictionaries, e.g., Virginia's "Glossary of Terms Commonly Used in Court", but surely some of those terms are also defined within the Code of Virginia.

BTW, here's a Gist of a list of every term defined at least two times in the Florida Statutes. Consider it a second set of data points.

gregelin commented 10 years ago

@waldoj The primary discussed value of a public domain mouse-over glossary is plain language and ease of understanding. This primary value may find itself at odds at time with a more precise definition defined and used with in a statute.

In other words, sometimes context does matter. But how much does that context matter, to whom, and when? The contextual definition matters much more if you are a lawyer or considering an action or debating on a bill. If term "person" gets defined to included a fetus or a citizen or anyone even if not a citizen then the ramifications are significant.

Maybe a solution is managing two glossaries? The first glossary is plain language. The second glossary is sensitive, supports multiple terms, and is (mostly) machine generated. Statute specific terms would be fixed to the name space of the governmental authority (and statute), just like variables tied to name space of functions and packages.

The plain language is re-useable across authorities and laws. The statute-sensitive is managed for the context by the individual state/city.

Application logic (in Javascript) enables examining both plain language and statute-sensitive glossaries at mouse-over and provides real-time disambiguation. We use a cascading style-sheets technical pattern.I guess these would be cascading glossaries.

The disambiguation could take several forms.The pre-programmed default probably has the plain language term show up with some (glowing) glyph indicating a context sensitive definition also exists for what is being read. This default behavior could be adjusted by the individual state/city say to perhaps have the statute-sensitive definition appear if one existed and have a link to plain language. One might even imagine individual users adjusting a profile setting to preference plain language or statute-sensitive.

This approach separates the ultra-user friendly use case from the (sometimes very important) ultra-context sensitive use case while still maintaining a re-usable and consistent framework. General definitions here, complex definitions there. The glossary project maintains the mouse-over friendliness value proposition while also being extensible for the statute-specific cases.

waldoj commented 10 years ago

Maybe a solution is managing two glossaries? The first glossary is plain language. The second glossary is sensitive, supports multiple terms, and is (mostly) machine generated. Statute specific terms would be fixed to the name space of the governmental authority (and statute), just like variables tied to name space of functions and packages.

The plain language is re-useable across authorities and laws. The statute-sensitive is managed for the context by the individual state/city.

I think you just reinvented The State Decoded. ;)