lmullen / legal-modernism

Law and legal practice modernized in the nineteenth-century United States. We are studying and visualizing the history of the modernization of American law.
https://legalmodernism.org
MIT License
4 stars 0 forks source link

Remove "treatises" that aren't treatises #89

Closed lmullen closed 1 year ago

lmullen commented 1 year ago

We have a bunch of "treatises" that aren't actually treatises. The most obvious ones are biographical. Here is one: Biographical sketches of eminent lawyers, statesmen, and men of letters. It would be nice to just exclude these entirely from the corpus.

I will share a list of the U.S. treatises which have the subjects in an easily digested column. Are there any subjects where we can exclude any treatise with that subject? (Note that these are MOML subjects, not the LOC subjects which we also have.)

kfunk074 commented 1 year ago

Biography can be excluded.

Collected Essays can be excluded, including when the subject appears with other subjects (commonly General Studies).

I'm surprised to find that there are actually substantive treatises less than 70 pages, so I'm loathe to go too low, but I think we can safely dispense with the 2,800 or so works that are 30 pages or less.

General Studies has a lot of irrelevant stuff like addresses and orations, but it also contains cross-category commentaries like James Kent's. We could safely exclude anything with "remarks of" "oration" or "address" in the title I think. But the subject itself can't be the target.

lmullen commented 1 year ago

Okay, thanks. This has become kind of complicated. I might just do the easiest thing first and exclude biography and collected essays, then deal with the rest later.

lmullen commented 1 year ago

There are 78 U.S. "treatises" with the subject "Trials." Those are just the reports of individual trials, and not treatises, so excluding those too.

lmullen commented 1 year ago

There are 12,064 U.S. "treatises" (counting multivolume works as a single treatise). I have not tried to eliminate short works. But leaving that aside, with the filters described above I have gotten it down to 10,544 treatises.

The query looks like this, though this is of course subject to change.

SELECT
    *
FROM
    moml.treatises
WHERE
    'UK' != ALL (subjects)
    AND 'Biography' != ALL (subjects)
    AND 'Collected Essays' != ALL (subjects)
    AND 'Trials' != ALL (subjects)
    AND NOT title ~* '\Woration\W'
    AND NOT title ~* '\Wremarks of\W'
    AND NOT title ~* '\Waddress\W';