wxf4150 / esdump

elasticserch dump import export
8 stars 5 forks source link

Exporting efficiency is a bit slow #7

Open kerven88 opened 2 years ago

kerven88 commented 2 years ago

Exporting 7G of data took 30 minutes.

2021/11/25 03:13:49 execute export 2021/11/25 03:44:16 total exported 830000 items; total_raw_bytes: 7678.78 MB

wxf4150 commented 2 years ago

i have a test on 500k document . very quick. take 30s. the log above:

2021/11/25 14:30:52 total exported 430000 items; total_raw_bytes: 334.99 MB
2021/11/25 14:30:52 total exported 440000 items; total_raw_bytes: 342.68 MB
2021/11/25 14:30:53 total exported 450000 items; total_raw_bytes: 350.58 MB
2021/11/25 14:30:54 total exported 460000 items; total_raw_bytes: 358.24 MB
2021/11/25 14:30:54 total exported 470000 items; total_raw_bytes: 365.94 MB
2021/11/25 14:30:55 total exported 480000 items; total_raw_bytes: 373.64 MB
2021/11/25 14:30:55 total exported 490000 items; total_raw_bytes: 381.29 MB
2021/11/25 14:30:56 total exported 500000 items; total_raw_bytes: 389.01 MB;
2021/11/25 14:30:56 export time spend 30.310132739s
[root@upretsops tmp]# ll -h my_index.json 
-rw-r--r-- 1 root root 390M 11月 25 14:30 my_index.json
[root@upretsops tmp]# more my_index.json
{"ID":"163820688","RawData":{"id":163820688,"asset":{"animationUrl":null,"addre":"0x7bd29408f11d2bfc23c34f18275bbf23bb716bc7","nftI
d":536535,"description":"Meebit #9270","id":23305795,"imageUrl":"https://lh3.googleusercontent.com/cQgwa0ZLZgw5L7pDZA4kjSiblU51fML0
HmwiI9dA2dOUZxcn1YAbvT4kEqGsvSInLGGGKFvWEGxne_pPw_qQF5pJ6lkiIufnxlt-jw","name":"Meebit #9270","tokenId":"9270"},"nftId":536535,"cha
in":"ETHEREUM","eventType":"OFFER_ENTERED","eventTimestamp":"2021-05-16T15:11:50.626499Z","unitPrice":"630000000000000000","price":
0.63,"quantity":"1","from":"0x33d27f0bb797de3d20e519f4a192b7570c56681b","to":"","seller":"","winner":"","tx":"","it":"2021-05-16T23
:12:00.795549594+08:00"}}
....
....

is your comand run on the elasticsearch-server local and with the 127.0.0.1 local-ip?

kerven88 commented 2 years ago

yes. I run command on the elasticsearch-server local and with the 127.0.0.1 local-ip.

wxf4150 commented 2 years ago

is your server memory enough?

I have a test another index ,the document more big , about 20M per 10000 docs.

[root@test tmp]# ./esdump export --es http://localhost:9200  --index tokens --gzip=false  -o /tmp/my_index.json
2021/11/25 15:47:28 binary version: unknown version build Time: unknown time
2021/11/25 15:47:28 execute  export
2021/11/25 15:47:28 export index tokens to /tmp/my_index.json
2021/11/25 15:47:28 init elasticsearch Client  with url: http://localhost:9200
2021/11/25 15:47:28 export match: {"match_all":{}}
2021/11/25 15:47:29 total exported 10000 items; total_raw_bytes: 23.17 MB; esFetchtime 0.896356
2021/11/25 15:47:30 total exported 20000 items; total_raw_bytes: 36.91 MB; esFetchtime 0.616517
2021/11/25 15:47:31 total exported 30000 items; total_raw_bytes: 53.67 MB; esFetchtime 0.679568
2021/11/25 15:47:32 total exported 40000 items; total_raw_bytes: 69.70 MB; esFetchtime 0.645046
2021/11/25 15:47:32 total exported 50000 items; total_raw_bytes: 85.60 MB; esFetchtime 0.640776
2021/11/25 15:47:33 total exported 60000 items; total_raw_bytes: 104.32 MB; esFetchtime 0.709815
2021/11/25 15:47:35 total exported 70000 items; total_raw_bytes: 155.51 MB; esFetchtime 1.396025
2021/11/25 15:47:35 total exported 71433 items; total_raw_bytes: 166.76 MB; esFetchtime 0.338897
2021/11/25 15:47:36 export time spend 7.400013709s

the tow tests. esdump speed is 13-23M/s .

please download the new release, have a new test. show me part of the output log.

kerven88 commented 2 years ago

Server Resources: [root@ ~]# nproc 8 [root@ ~]# free -h total used free shared buff/cache available Mem: 61G 48G 352M 873M 13G 12G Swap: 0B 0B 0B

Pull the latest code build version: [root@ ~]# /usr/local/bin/esdump version 2021/11/25 09:05:09 binary version: unknown version build Time: unknown time 2021/11/25 09:05:09 execute version 2021/11/25 09:05:09 version time spend 1.398µs

dump log: [root@ ~]# /usr/local/bin/esdump export \

--es "http://127.0.0.1:9200" \
--index "forseti-r-20211124" \
--gzip=true \
--o "/tmp/forseti-20211124.json.gz" \
--MatchBody '
{
  "range": {
      "activity.S_T_VD_EVENTOCCURDATE.raw": {
          "gte": "2021-11-24 00:00:00",
          "lt": "2021-11-24 00:30:00"
      }
  }
}'

2021/11/25 09:01:42 binary version: unknown version build Time: unknown time 2021/11/25 09:01:42 execute export 2021/11/25 09:01:42 export index forseti-r-20211124 to /tmp/forseti-20211124.json.gz 2021/11/25 09:01:42 init elasticsearch Client with url: http://127.0.0.1:9200 2021/11/25 09:01:42 export match: { "range": { "activity.S_T_VD_EVENTOCCURDATE.raw": { "gte": "2021-11-24 00:00:00", "lt": "2021-11-24 00:30:00" } } } 2021/11/25 09:02:03 total exported 10000 items; total_raw_bytes: 98.57 MB 2021/11/25 09:02:23 total exported 20000 items; total_raw_bytes: 196.10 MB 2021/11/25 09:02:37 total exported 30000 items; total_raw_bytes: 293.55 MB 2021/11/25 09:02:48 total exported 34214 items; total_raw_bytes: 334.53 MB;the gzip size: 87.01 MB 2021/11/25 09:02:48 export time spend 1m5.814624223s

The indicator "esFetchtime" seems to be missing in my export log ?

wxf4150 commented 2 years ago

your version is old. i have test a lot local . 90% time spend on fetch es data.

kerven88 commented 2 years ago

Can you print out the version number? The latest master branch I pulled 2021/11/25 09:05:09 binary version: unknown version build Time: unknown time

kerven88 commented 2 years ago

Or can you upload a latest binary version for me to test?

wxf4150 commented 2 years ago

https://github.com/wxf4150/esdump/releases download here

the version str is inject by github-workflows. when push a tag, github-workflows use ".github/workflows/release.yml" file build the project ,and inject push-tag as version.

wxf4150 commented 2 years ago

i have release a new version again: change "fetch data" in a new thread. store data process will not block fetch data from .

the export performance now increase 10%

2021/11/25 17:19:18 total exported 400000 items; total_raw_bytes: 312.31 MB; storeTime 0.000000
2021/11/25 17:19:18 total exported 410000 items; total_raw_bytes: 319.91 MB; storeTime 0.000000
2021/11/25 17:19:19 total exported 420000 items; total_raw_bytes: 327.42 MB; storeTime 0.000000
2021/11/25 17:19:19 total exported 430000 items; total_raw_bytes: 334.99 MB; storeTime 0.000000
2021/11/25 17:19:20 total exported 440000 items; total_raw_bytes: 342.68 MB; storeTime 0.000000
2021/11/25 17:19:20 total exported 450000 items; total_raw_bytes: 350.58 MB; storeTime 0.000000
2021/11/25 17:19:21 total exported 460000 items; total_raw_bytes: 358.24 MB; storeTime 0.000000
2021/11/25 17:19:21 total exported 470000 items; total_raw_bytes: 365.94 MB; storeTime 0.000000
2021/11/25 17:19:22 total exported 480000 items; total_raw_bytes: 373.64 MB; storeTime 0.000000
2021/11/25 17:19:22 total exported 490000 items; total_raw_bytes: 381.29 MB; storeTime 0.000000
2021/11/25 17:19:23 1000 pages FetchTime 5.236988501999992 s
2021/11/25 17:19:23 totalFetchTime  27.13077194699997 s
2021/11/25 17:19:23 total exported 500000 items; total_raw_bytes: 389.01 MB; storeTime 0.000000
2021/11/25 17:19:23 total exported 500000 items; total_raw_bytes: 389.01 MB; storeTime 0.000000
2021/11/25 17:19:23 export time spend 27.954677496s

the log "2021/11/25 17:19:23 totalFetchTime" is all the es fetch time.

maybe the fetch process should parallel,with more go-routines. but i have no idea , the go-es-client fetch cursor is not a number (page-number).

kerven88 commented 2 years ago

Thank you very much, I'll test it later

kerven88 commented 2 years ago

It doesn't look like my export times are continuous, what could be the reason? But your time is continuous.

image

kerven88 commented 2 years ago

The export speed on my side does seem to be much slower.

2021/11/25 09:52:50 total exported 10000 items; total_raw_bytes: 97.41 MB; storeTime 0.000000 2021/11/25 09:53:13 total exported 20000 items; total_raw_bytes: 194.95 MB; storeTime 0.000000 2021/11/25 09:53:30 total exported 30000 items; total_raw_bytes: 293.59 MB; storeTime 0.000000 2021/11/25 09:53:50 total exported 40000 items; total_raw_bytes: 391.61 MB; storeTime 0.000000 2021/11/25 09:54:00 total exported 50000 items; total_raw_bytes: 489.30 MB; storeTime 0.000000 2021/11/25 09:54:08 totalFetchTime 95.76896522299998 s 2021/11/25 09:54:08 total exported 57985 items; total_raw_bytes: 566.50 MB; storeTime 0.000000 2021/11/25 09:54:08 export time spend 1m35.826526269s

wxf4150 commented 2 years ago

@kerven88

Can you print out the version number? The latest master branch I pulled 2021/11/25 09:05:09 binary version: unknown version build Time: unknown time

git pull --tags # will pull all the tags then use this command "make linux-amd64" will build a the project with the latest tag as version str.

see the Makefile in the project for more detail.

wxf4150 commented 2 years ago

The export speed on my side does seem to be much slower. the result export time spend 1m35.826526269s ,totalFetchTime 95.7; all most 100% time are spend on fetch es data.

57985 items will fetch 579 times from the es-server. 95s/579 = 0.164s . every fetch-request spend160ms, you es-api is slow.

export-size 566.50 MB / 95s= 5.9M/s. my all test at local: with small document the speed is 14M/s, with more big document the will up to 27M/s .

is your es-server-ip is local-ip? or corss a 100M-LAN network? .

kerven88 commented 2 years ago

es-server-ip is local-ip. It is possible that it could be a stress issue with my es cluster?

wxf4150 commented 2 years ago

es-server-ip is local-ip. It is possible that it could be a stress issue with my es cluster?

i was test on a single es-node, not a cluster. may be you "cluster network" is slow .
you can test it with a scp command. scp a big-file from one server to another.

or other cluster issue.

wxf4150 commented 2 years ago

用中文吧。 真累

kerven88 commented 2 years ago

哈哈,原来是国人,逗死了

kerven88 commented 2 years ago

我是在es本节点,请求本机ip导出的,按理说不存在网络带宽问题,有可能是es集群负载压力问题。