vieyahn2017 / shellv

shell command test and study
4 stars 1 forks source link

10.26 awk用法收集tips #61

Open vieyahn2017 opened 4 years ago

vieyahn2017 commented 4 years ago

awk

vieyahn2017 commented 4 years ago

awk显示指定行到末尾行之间小技巧

使用awk来显示全部行,可以使用print $0来搞定,如果碰到要显示从第二行到第末尾的话,比如要显示/etc/passwd文件中,从第二列到第末尾的所有行,可以

cat /etc/passwd | awk -F: 'print $2, $3, $4, $5, $6, $7'

上面的是因为我们知道/etc/passwd就只有7行,可以一行行列出,如果是未知行,或者行数比较多的话,这样写显然不方便。 这个时候就有个小技巧,让第一列设置为空,这样的话,显示所有$0,就不会包含第一列了,而且又不必一行行列出所有列

[root@localhost ~]# cat /etc/passwd | awk -F: '{$1="";sub(' ', ''); print}'
# $1=""          将第一列默认设置为空
# sub('  ', '')  将第一行空出来的空格去掉
# print          等价于print $0
vieyahn2017 commented 4 years ago

awk 字符串替换 gsub

本文链接:https://blog.csdn.net/bitcarmanlee/article/details/50975809

gsub(r,s)    在整个$0中用s替代r
gsub(r,s,t)    在整个t中用s替代r

awk -F "," '{str=gsub(/\t*| *$/,"",$3);ret=$1","$2","$3","NR;print ret}' 去除第三个字段的空格与制表符

awk -F "\t" '{if($3=="吉林") {gsub($3,"吉林省",$3);print $0}}'  area_province

220005 延边 吉林省
220007 松原 吉林省
220006 通化 吉林省
220003 白城 吉林省
220001 长春 吉林省
220002 四平 吉林省
220008 吉林 吉林省
220004 辽源 吉林省
220009 白山 吉林省
229999 吉林其它 吉林省

对排好序的各个端数据取前1000

sort -t , -k3,3 -k4,4nr file | awk -F "," '{str=gsub(/\t*| *$/,"",$4);a[$3]++;{if(a[$3]<=1000) print $1","$2","$3","$4","a[$3]}}' z1 >zzz 
vieyahn2017 commented 4 years ago

awk 取列后对数值进行判断取出大于1的数值

https://blog.csdn.net/weixin_33819479/article/details/86022587

tail -2 access.log

122.238.119.177 - - [26/Oct/2018:18:20:25 +0800] "GET /api//shop/follow_cancel?shopId=124732134 HTTP/1.1" 200 41 "https://bi.deepfashion.cn/page/dataline/shopwatch" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36" "-" "0.008"
122.238.119.177 - - [26/Oct/2018:18:20:26 +0800] "GET /api/daily/follow-shop-list?watchStatus=1&shopFilterStatus=2&firstCategoryName=&secondCategoryName=&shopType=&dateRangeStatus=3&averagePrice=&shopStyle=&shopName=&business=%E5%A5%B3%E8%A3%85&rankStatus=2&startDate=2018-10-25&endDate=2018-10-25&pageNo=3&pageSize=20 HTTP/1.1" 200 28940 "https://bi.deepfashion.cn/page/dataline/shopwatch" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36" "-" "0.245"

nginx 日志中最后一列输出的 request-time 时间, 需取出时间大于 1秒的 数值。

tail -100 access.log |awk '{print $NF}'|awk -F '"' ' $2>1 '|awk -F '"' '{print $2}'
1.581
vieyahn2017 commented 4 years ago

awk 取列后对数值进行判断取出大于1的数值

nginx log my test

首先,nginx日志输出格式是根设置有关的 参考 https://blog.csdn.net/LL845876425/article/details/81490114

我们的设置

http {
    ...
    log_format main '$time_local [$remote_addr] $request_method "$request_uri" | '
    '$status | "$upstream_addr" [ $upstream_response_time $request_time ]';

我们的日志


25/Oct/2019:22:57:36 +0800 [8.46.86.29] GET "/VIID/User/Member/VO/super_admin" | 200 | "100.129.133.85:8090" [ 0.028 0.026 ]
25/Oct/2019:22:57:39 +0800 [8.46.86.29] GET "/assets/images/admin/success.png" | 200 | "100.129.133.85:8088" [ 0.040 0.041 ]
25/Oct/2019:22:57:42 +0800 [8.46.86.29] OPTIONS "/VIID/User/Member/Password/super_admin" | 200 | "100.129.134.66:8090" [ 0.016 0.012 ]
25/Oct/2019:22:57:43 +0800 [8.46.86.29] PUT "/VIID/User/Member/Password/super_admin" | 200 | "100.129.133.85:8090" [ 0.492 0.490 ]
25/Oct/2019:22:57:44 +0800 [8.46.86.29] GET "/" | 200 | "100.129.134.66:8088" [ 0.012 0.010 ]
25/Oct/2019:22:57:44 +0800 [8.46.86.29] GET "/runtime.8ac0b4fff4b3a99f7a27.js" | 200 | "100.129.134.66:8088" [ 0.016 0.017 ]
25/Oct/2019:22:57:44 +0800 [8.46.86.29] GET "/polyfills.c74c352fef71c1d0a09a.js" | 200 | "100.129.133.85:8088" [ 0.040 0.040 ]
25/Oct/2019:22:57:44 +0800 [8.46.86.29] GET "/styles.1347f65e78a0feca7402.css" | 200 | "100.129.133.85:8088" [ 0.040 0.162 ]
25/Oct/2019:22:57:45 +0800 [8.46.86.29] GET "/scripts.ebe9d134ab87d36ce128.js" | 200 | "100.129.134.66:8088" [ 0.032 0.717 ]
25/Oct/2019:22:57:45 +0800 [8.46.86.29] GET "/main.0bd24e312ae67de58790.js" | 200 | "100.129.133.85:8088" [ 0.104 1.217 ]
25/Oct/2019:22:57:45 +0800 [8.46.86.29] GET "/config.json" | 200 | "100.129.134.66:8088" [ 0.008 0.007 ]
25/Oct/2019:22:57:45 +0800 [8.46.86.29] GET "/config.json" | 200 | "100.129.134.66:8088" [ 0.008 0.007 ]
25/Oct/2019:22:57:45 +0800 [8.46.86.29] GET "/1.61490734becfda923e9a.js" | 304 | "100.129.134.66:8088" [ 0.008 0.009 ]
25/Oct/2019:22:57:45 +0800 [8.46.86.29] GET "/common.a539e68ba169458a5336.js" | 200 | "100.129.133.85:8088" [ 0.012 0.009 ]
25/Oct/2019:22:57:45 +0800 [8.46.86.29] GET "/assets/images/logo.png" | 200 | "100.129.133.85:8088" [ 0.048 0.048 ]
25/Oct/2019:22:57:45 +0800 [8.46.86.29] GET "/4.3bde0384d2d4aeec92af.js" | 304 | "100.129.134.66:8088" [ 0.016 0.013 ]
25/Oct/2019:22:57:45 +0800 [8.46.86.29] GET "/2.6577a1de8f302f06d31b.js" | 304 | "100.129.133.85:8088" [ 0.016 0.017 ]
25/Oct/2019:22:57:45 +0800 [8.46.86.29] GET "/5.19a7f4e75eec899427bb.js" | 200 | "100.129.134.66:8088" [ 0.012 0.013 ]
25/Oct/2019:22:57:45 +0800 [8.46.86.29] GET "/13.521be872d549aeed8286.js" | 304 | "100.129.134.66:8088" [ 0.020 0.016 ]
25/Oct/2019:22:57:45 +0800 [8.46.86.29] GET "/6.d2b4292a35d688b9ce06.js" | 304 | "100.129.133.85:8088" [ 0.016 0.018 ]
25/Oct/2019:22:57:45 +0800 [8.46.86.29] GET "/11.89b4128639d5dba6b3b6.js" | 304 | "100.129.134.66:8088" [ 0.008 0.008 ]
25/Oct/2019:22:57:45 +0800 [8.46.86.29] GET "/14.0d07821a8d3ed6a23dcb.js" | 304 | "100.129.133.85:8088" [ 0.012 0.010 ]
25/Oct/2019:22:57:45 +0800 [8.46.86.29] GET "/province.json?random=0.9808909796692098" | 200 | "100.129.133.85:8088" [ 0.016 0.149 ]
25/Oct/2019:22:57:46 +0800 [8.46.86.29] GET "/assets/images/password.png" | 200 | "100.129.133.85:8088" [ 0.008 0.010 ]
25/Oct/2019:22:57:46 +0800 [8.46.86.29] GET "/assets/images/name.png" | 200 | "100.129.134.66:8088" [ 0.008 0.011 ]
25/Oct/2019:22:57:46 +0800 [8.46.86.29] GET "/12.8f7c3713bd4fd78c8a1a.js" | 200 | "100.129.133.85:8088" [ 0.040 0.322 ]
25/Oct/2019:22:57:46 +0800 [8.46.86.29] GET "/assets/images/login_background.jpg" | 200 | "100.129.134.66:8088" [ 0.016 0.250 ]

第12列是响应时间

tail -100 access.log  | awk '{if($12>1){print}}'

25/Oct/2019:22:57:22 +0800 [8.46.86.29] GET "/main.0bd24e312ae67de58790.js" | 200 | "100.129.134.66:8088" [ 0.392 1.129 ]
25/Oct/2019:22:57:24 +0800 [8.46.86.29] GET "/province.json?random=0.7844033898166471" | 200 | "100.129.134.66:8088" [ 0.032 1.002 ]
25/Oct/2019:22:57:24 +0800 [8.46.86.29] GET "/2.6577a1de8f302f06d31b.js" | 200 | "100.129.133.85:8088" [ 0.020 1.219 ]
25/Oct/2019:22:57:36 +0800 [8.46.86.29] GET "/VIID/SpecialLibrary/Dictionary/All" | 200 | "100.129.134.66:8090" [ 1.380 1.474 ]
25/Oct/2019:22:57:45 +0800 [8.46.86.29] GET "/main.0bd24e312ae67de58790.js" | 200 | "100.129.133.85:8088" [ 0.104 1.217 ]

对比我用if 和 参考例子的方法

tail -100 access.log | awk -F "["  '{print $NF}' | awk ' $2>1 ' | awk '{print $2}'
1.129
1.002
1.219
1.474
1.217

 tail -100 access.log | awk -F "["  '{print $NF}' | awk '{if($2>1){print $2;}}'
1.129
1.002
1.219
1.474
1.217
vieyahn2017 commented 2 years ago

awk 打印除第一列以外剩下所有的 把$1置空即可

cat 1.log |  awk  '{$1="";print}'  
# 会多出一空格

cat 1.log |  awk  '{$1="";sub(" ","");print}'  
# 正确

cat 1.log |  awk -F "|"  '{$1="|  ---   |";print}'
# 替换为指定内容