zoonomen-APP / Lichen

0 stars 0 forks source link

##Look for gender ratio #12

Closed zoonomen-APP closed 1 week ago

zoonomen-APP commented 2 weeks ago
awk 'BEGIN {FS=OFS="|"} {print $99}' df|e '[A-z]'|wc

135114

$ awk 'BEGIN {FS=OFS="|"} {print $99}' df|e '[A-z]'|e 'Ms\.'|wc -l

19230

so 14% -- well below what Jim thought would be worth commenting on.

Lets look at Grafton Co. $ awk 'BEGIN {FS=OFS="|"} $3~/Grafton_Co./ {print $99}' df|e '[A-z]'|wc -l 6699

$ awk 'BEGIN {FS=OFS="|"} $3~/Grafton_Co./ {print $99}' df|e '[A-z]'|e 'Ms.'|wc -l 2387

so 36% well above Jim's 20% limit.

How about Aroostook Co. $ awk 'BEGIN {FS=OFS="|"} $3~/Aroostook_Co./ {print $99}' df|e '[A-z]'|wc -l 15542

Alan@DESKTOP-30GEGVP MINGW64 /c/Lichen/datastor/NE/MA/sb (master) $ awk 'BEGIN {FS=OFS="|"} $3~/Aroostook_Co./ {print $99}' df|e '[A-z]'|e 'Ms.'|wc -l 1343

9%... the Selva effect.

How about Norfolk.

$ awk 'BEGIN {FS=OFS="|"} $3~/Norfolk_Co./ {print $99}' df|e '[A-z]'|e 'Ms.'|wc -l 1172

Alan@DESKTOP-30GEGVP MINGW64 /c/Lichen/datastor/NE/MA/sb (master) $ awk 'BEGIN {FS=OFS="|"} $3~/Norfolk_Co./ {print $99}' df|e '[A-z]'|wc -l 2410

49%

zoonomen-APP commented 2 weeks ago

Now look at Washington_Co. Maine.

awk 'BEGIN {FS=OFS="|"} $4~/Maine/&& $3~/Washington_Co./ {print $99}' df|e '[A-z]'|wc -l

4978 total collector strings.

awk 'BEGIN {FS=OFS="|"} $4~/Maine/&& $3~/Washington_Co./ {print $99}' df|e '[A-z]'|e 'Ms\.'|wc -l

1327 strings contain "Ms." -- 27%

zoonomen-APP commented 2 weeks ago

Continuing to look for pre and post 1965 difference.

$ awk 'BEGIN {FS=OFS="|"} {print $99}' NEdfbn|e '[A-z]'|wc -l

107019

 awk 'BEGIN {FS=OFS="|"} $100>1965 {print $99,$100}' NEdfbn|e '[A-z]'|wc -l

55244 So 55244 records >1965.

awk 'BEGIN {FS=OFS="|"} $100>1965&&$99~/Ms\./ {print $99,$100}' NEdfbn|e '[A-z]'|wc -l

11118 so 11118/55244 = 20 % post 1965.

awk 'BEGIN {FS=OFS="|"} $100<=1965 {print $99,$100}' NEdfbn|e '[A-z]'|wc -l

So 52435 total equal or before 1965.

awk 'BEGIN {FS=OFS="|"} $100<=1965&&$99~/Ms\./ {print $99,$100}' NEdfbn|e '[A-z]'|wc -l

7239

so 7239/52435 = 14 % (and I expect mostly Cummings).

zoonomen-APP commented 1 week ago

Total collector numbers NEdfbn

-Redo for NEdfbn (previously done on df)

awk 'BEGIN {FS=OFS="|"} {print $99}' NEdfbn|e '[A-z]'|wc -l

-106737 total collector entries.

Total with "Ms."

awk 'BEGIN {FS=OFS="|"} {print $99}' NEdfbn|e '[A-z]'|e 'Ms\.'|wc -l